0% found this document useful (0 votes)
42 views8 pages

Understanding Confidence Intervals

Uploaded by

Nado Cuiabá
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views8 pages

Understanding Confidence Intervals

Uploaded by

Nado Cuiabá
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Volume 3, number 1

What are
confidence
intervals?
Sponsored by an educational grant from AVENTIS Pharma

● A confidence interval calculated for a measure of treatment


Huw T O Davies
effect shows a range within which the true treatment effect is
MA, MSc, PhD,
likely to lie.
HonMFPHM
Reader in Health
● Confidence intervals are preferable to p-values, as they tell us the
Care Policy and
range of possible effect sizes compatible with the data.
Management,
University of
St Andrews ● A confidence interval that embraces the value of no
difference indicates that the treatment under investigation is not
significantly different from the control.

● Confidence intervals aid interpretation of clinical trial


data by putting upper and lower bounds on the likely size of
any true effect.

● Bias must be assessed before confidence intervals can be


interpreted. Even very large samples and very narrow confidence
intervals can mislead if they come from biased studies.

● Non-significance does not mean ‘no effect’. Small studies


will often report non-significance even when there are important,
real effects.

● Statistical significance does not necessarily mean that the effect is


real: by chance alone about one in 20 significant findings will
be spurious.

● Statistical significance does not necessarily mean clinically


important. It is the size of the effect that determines the
www.evidence-based-medicine.co.uk importance, not the presence of statistical significance.
Prescribing information is on page 8

1
What are
confidence intervals?

What are confidence intervals?

Measuring effect size may be appropriate, subject to correct


Clinical trials aim to generate new interpretation.
knowledge on the effectiveness (or Whatever the measure used, some
otherwise) of healthcare assessment must be made of the
interventions. Like all clinical trustworthiness or robustness of the
findings. The findings of the study provide a
research, this involves estimating a
point estimate of effect, and this raises a
key parameter of interest, in this case dilemma: are the findings discovered about
the effect size. The effect size can be the sample also likely to be true about other
measured in a variety of ways, such as similar groups of patients? Before we can
the relative risk reduction, the answer such a question, two issues need to be
absolute risk reduction or the number addressed. Does any apparent treatment
needed to treat (NNT) (Table 1). benefit arise because of the way the study has
been conducted (bias), or could it arise
Relative measures tend to
simply because of chance? The short note
emphasise potential benefits, whereas below briefly covers the importance of
absolute measures provide an assessing bias but focuses more on assessing
across-the-board summary.1 Either the role of chance.

Table 1: Summary of effect measures

Measure of effect Abbreviation Description No effect Total success


Absolute risk ARR Absolute change in risk: the risk ARR=0% ARR=initial risk
reduction of an event in the control group
minus the risk of an event in the
treated group; usually expressed
as a percentage
Relative risk RRR Proportion of the risk removed RRR=0% RRR=100%
reduction by treatment: the absolute risk
reduction divided by the initial
risk in the control group; usually
expressed as a percentage
Relative risk RR The risk of an event in the treated RR=1 or RR=0
group divided by the risk of an event RR=100%
in the control group; usually expressed
as a decimal proportion, sometimes as
a percentage
Odds ratio OR Odds of an event in the treated group OR=1 OR=0
divided by the odds of an event in the
control group; usually expressed as a
decimal proportion
Number needed NNT Number of patients who need to be NNT=∝ NNT=1/initial risk
to treat treated to prevent one event; this is the
reciprocal of the absolute risk reduction
(when expressed as a decimal fraction);
it is usually rounded to a whole number

2
What are
confidence intervals?

Bias of whether or not the findings are


Bias is a term that covers any systematic ‘significantly different’ or ‘not significantly
errors in the way the study was designed, different’ from some reference value (in trials,
executed or interpreted. Common flaws in this is usually the value reflecting ‘no effect’;
treatment trials are: lack of (or failure in) see Table 1). More recently, a different and
randomisation, leading to unbalanced more useful approach to assessing the role of
groups; poor blinding, leading to unfair chance has come to the fore: confidence
treatment and biased assessments; and large intervals.8 Although these might appear
numbers of patients lost to follow-up. rather dissimilar to p-values, the theory and
Assessment in these areas is crucial before calculations underlying these two approaches
the results from any trial can be assessed, are largely the same.
and many useful guides exist to assist this
process.2–5 Interpretation of the effects of
What are confidence
chance is only meaningful once bias has
been excluded as an explanation for any intervals?
observed differences.6,7 Confidence intervals provide different
information from that arising from
Chance variability hypothesis tests. Hypothesis testing
Until comparatively recently, assessments of produces a decision about any observed
the role of chance were routinely made using difference: either that the difference is
hypothesis testing, which produces a ‘p- ‘statistically significant’ or that it is
value’ (Box 1). The p-value allows assessment ‘statistically non-significant’. In contrast,

Box 1: Hypothesis testing and the generation of p-values


The logic of hypothesis testing and p-values is convoluted. Suppose a new treatment appears to
outperform the standard therapy in a research study. We are interested in assessing whether this
apparent effect is likely to be real or could just be a chance finding: p-values help us to do this.
In calculating the p-value, we first assume that there really is no true difference between the
two treatments (this is called the null hypothesis). We then calculate how likely we are to see
the difference that we have observed just by chance if our supposition is true (that is, if there is
really no true difference). This is the p-value.
So the p-value is the probability that we would observe effects as big as those seen in the
study if there was really no difference between the treatments. If p is small, the findings are
unlikely to have arisen by chance and we reject the idea that there is no difference between the
two treatments (we reject the null hypothesis). If p is large, the observed difference is plausibly a
chance finding and we do not reject the idea that there is no difference between the treatments.
Note that we do not reject the idea, but we do not accept it either: we are simply unable to say
one way or another until other factors have been considered.
But what do we mean by a ‘small’ p-value (one small enough to cause us to reject the idea
that there was really no difference)? By convention, p-values of less than 0.05 are considered
‘small’. That is, if p is less than 0.05 there is a less than one in 20 chance that a difference as big
as that seen in the study could have arisen by chance if there was really no true difference. With
p-values this small (or smaller) we say that the results from the trial are statistically significant
(unlikely to have arisen by chance). Smaller p-values (say p<0.01) are sometimes called ‘highly
significant’ because they indicate that the observed difference would happen less than once in a
hundred times if there was really no true difference.

3
What are
confidence intervals?

confidence intervals provide a range about Examining the width of a


the observed effect size. This range is confidence interval
constructed in such a way that we know how
likely it is to capture the true – but unknown One of the advantages of confidence intervals
– effect size. over traditional hypothesis testing is the
Thus the formal definition of a confidence additional information that they convey. The
interval is: ‘a range of values for a variable of upper and lower bounds of the interval give
interest [in our case, the measure of treatment us information on how big or small the true
effect] constructed so that this range has a effect might plausibly be, and the width of
specified probability of including the true the confidence interval also conveys some
value of the variable. The specified useful information.
probability is called the confidence level, and If the confidence interval is narrow,
the end points of the confidence interval are capturing only a small range of effect sizes, we
called the confidence limits’.9 can be quite confident that any effects far
It is a widespread convention to create from this range have been ruled out by the
confidence intervals at the 95% level – so study. This situation usually arises when the
this means that 95% of the time properly size of the study is quite large and hence the
constructed confidence intervals should estimate of the true effect is quite precise.
contain the true value of the variable Another way of saying this is to note that
of interest. the study has reasonable ‘power’ to detect
Hence, more colloquially, the confidence an effect.
interval provides a range for our estimate However, if the confidence interval is
of the true treatment effect that is quite wide, capturing a diverse range of
plausible given the size of the difference effect sizes, we can infer that the study was
actually observed. probably quite small. Thus any estimates of
effect size will be quite imprecise. Such a
study is ‘low powered’ and provides us with
Assessing significance
less information.
from a confidence interval
One useful feature of confidence intervals is
that one can easily tell whether or not Errors in interpretation
statistical significance has been reached, just Confidence intervals, like p-values, provide us
as in a hypothesis test. with a guide to help with the interpretation
● If the confidence interval captures the of research findings in the light of the effects
value reflecting ‘no effect’, this represents of chance. There are, however, three
a difference that is statistically important pitfalls in interpretation.
non-significant (for a 95% confidence
interval, this is non-significance at the Getting it wrong: seeing effects that
5% level). are not real
● If the confidence interval does not First of all, we may examine the confidence
enclose the value reflecting ‘no effect’, interval and observe that the difference is
this represents a difference that is ‘statistically significant’. From this we will
statistically significant (again, for a 95% usually conclude that there is a difference
confidence interval, this is significance at between the two treatments. However, just
the 5% level). because we are unlikely to observe such a
Thus ‘statistical significance’ can be large difference simply by chance, this does
inferred from confidence intervals – but, not mean that it will not happen. By
in addition, these intervals show the definition, about one in 20 significant
largest and smallest effects that are likely findings will be spurious – arising simply
given the observed data. This is useful from chance. Thus we may be misled by
extra information. chance into believing in something that
An example of the use of confidence is not real – technically, this is called a
intervals is shown in Box 2. ‘type I error’.

4
What are
confidence intervals?

Box 2: An example of the use of confidence intervals


Ramipril is an angiotensin-converting enzyme (ACE) inhibitor which has been tested for use
in patients at high risk of cardiovascular events. In one study published in the New England
Journal of Medicine,10 a total of 9,297 patients were recruited into a randomised,
double-blind, controlled trial. The key findings presented on the primary outcome and deaths
were as follows:

Incidence of primary outcome and deaths from any cause

Outcome Ramipril group Placebo group Relative risk


(n=4,645) (n=4,652) (95% CI)
number (%) number (%)
Cardiovascular event 651 (14.0) 826 (17.8) 0.78
(including death) (0.70–0.86)
Death from 200 (4.3) 192 (4.1) 1.03
non-cardiovascular cause (0.85–1.26)
Death from any cause 482 (10.4) 569 (12.2) 0.84
(0.75–0.95)

These data indicate that fewer people treated with ramipril suffered a cardiovascular
event (14.0%) compared with those in the placebo group (17.8%). This gives a relative
risk of 0.78, or a reduction in (relative) risk of 22%. The 95% confidence interval for this
estimate of the relative risk runs from 0.70 to 0.86. Two observations can then be made from
this confidence interval:
● First, the observed difference is statistically significant at the 5% level, because the interval
does not embrace a relative risk of one.
● Second, the observed data are consistent with as much as a 30% reduction in relative risk or
as little as a 14% reduction in risk.
Similarly, the last row of the table shows that statistically significant reductions in the
overall death rate were recorded: a relative risk of 0.84 with a confidence interval running
from 0.75 to 0.95. Thus the true reduction in deaths may be as much as a quarter or it could
be only as little as 5%; however, we are 95% certain that the overall death rate is reduced in
the ramipril group.
Finally, exploring the data presented in the middle row shows an example of how a
confidence interval can demonstrate non-significance. There were a few more deaths
from non-cardiovascular causes in the ramipril group (200) compared with the placebo group
(192). Because of this, the relative risk is calculated to be 1.03 – showing a slight increase in risk
in the ramipril group. However, the confidence interval is seen to capture the value of no effect
(relative risk = 1), running as it does from 0.85 to 1.26. The observed difference is thus non-
significant; the true value could be anything from a 15% reduction in non-cardiovascular deaths
for ramipril to a 26% increase in these deaths. Not only do we know that the result is not
significant, but we can also see how large or small a true difference might plausibly be, given
these data.

5
What are
confidence intervals?

It is a frustrating but unavoidable feature are much more helpful than simple p-values:
of statistical significance (whether assessed the observed difference will also be
using confidence intervals or p-values) that compatible with a range of other effect sizes
around one in 20 will mislead. Yet we cannot as described by the confidence interval.8 We
know which of any given set of comparisons are unable to reject these possibilities either
is doing the misleading. This observation and must then assess whether some of these
cautions against generating too many possibilities might be important. Just because
statistical comparisons: the more we have not found a significant treatment
comparisons made in any given study the effect does not mean that there is no
greater the chance that at least some of them treatment effect to be found.11 The crucial
will be spurious findings. Thus clinical trials question is ‘how hard have we looked?’.
which show significance in only one or two
subgroups are unconvincing – such
Extrapolating beyond
significance may be deceptive. Unless
particular subgroup analyses have been the trial
specified in advance, differences other than For all the complexity of understanding bias
for the primary endpoint for the whole group and chance in the interpretation of the
should be viewed with suspicion. findings from clinical trials, another
important consideration should not be
Statistical significance and clinical forgotten. The findings from any given study
significance relate to the patients included in that study.
Statistical significance is also sometimes Even if an effect is assessed as probably real
misinterpreted as signifying an important and large enough to be clinically important, a
result: this is a second important pitfall in further question remains: how well are the
interpretation. Significance testing simply findings applicable to other groups of
asks whether the data produced in a study are patients, and do they particularise to a given
compatible with the notion of no difference individual?12 Neither confidence intervals nor
between the new and control interventions. p-values are much help with this judgement.
Rejecting equivalence of the two Assessment of this external validity is made
interventions does not necessarily mean that based on the patients’ characteristics and on
we accept that there is an important the setting and the conduct of the trial.
difference between them. A large study
may identify as statistically significant a
fairly small difference. It is then quite a Summary
separate judgement to assess the clinical Confidence intervals take as their starting
significance of this difference. In assessing point the results observed in a study.
the importance of significant results, it is the Crucially, we must check first that this is an
size of the effect not just the size of the unbiased study. The question they then
significance that matters. answer is ‘what is the range of real effects that
are compatible with these data?’. The
Getting it wrong again: failing to confidence interval is just such a range, which
find real effects 95% of the time will contain the true value of
A further error that we may make is to the main measure of effect (relative risk
conclude from a non-significant finding that reduction, absolute risk reduction, NNT or
there is no effect, when in fact there is – this whatever; Table 1).
is called a ‘type II error’. Equating non- This allows us to do two things. First, if the
significance with ‘no effect’ is a frequent and confidence interval embraces the value of no
damaging misconception. A non-significant effect (for example, no difference between
confidence interval simply tells us that the two treatments as shown by a relative risk
observed difference is consistent with there equal to one or an absolute difference equal
being no true difference between the two to zero) then the findings are non-significant.
groups. Thus we are unable to reject this If the confidence interval does not embrace
possibility. This is where confidence intervals the value of no difference then the findings

6
What are
confidence intervals?

are statistically significant. Thus confidence 2. Guyatt GH, Sackett DL, Cook DJ. Users’ guides to the
medical literature. II. How to use an article about therapy
intervals provide the same information as a or prevention. A. Are the results of the study valid? JAMA
1993; 270: 2598–2601.
p-value. But more than this: the upper and 3. Sackett DL, Haynes RB, Guyatt GH, Tugwell P. Clinical
lower extremities of the confidence interval epidemiology: A basic science for clinical medicine. 2nd edn.
Boston, Massachusetts: Little, Brown and Company, 1991.
also tell us how large or small the real effect 4. Sackett DL, Richardson WS, Rosenberg W, Haynes RB.
Evidence based medicine: how to practice and teach EBM.
might be and yet still give us the observed London: Churchill Livingstone, 1997.
findings by chance. This additional 5. Crombie IK. The pocket guide to critical appraisal.
London: BMJ Publishing, 1996.
information is very helpful in allowing us to 6. Brennan P, Croft P. Interpreting the results of
interpret both borderline significance and observational research: chance is not such a fine thing.
BMJ 1994; 309: 727–730.
non-significance. Confidence intervals from 7. Hill A, Spittlehouse C. What is critical appraisal? London:
Hayward Medical Communications, 2000 (in press).
large studies tend to be quite narrow in width, 8. Gardner MJ, Altman DG. Confidence intervals rather
showing the precision with which the study is than p values: estimation rather than hypothesis testing.
BMJ 1986; 292: 746–750.
able to estimate the size of any real effect. In 9. Last JM. A dictionary of epidemiology. Oxford:
International Journal of Epidemiology, 1988.
contrast, confidence intervals from smaller 10. The Heart Outcomes Prevention Evaluation Study
studies are usually wide, showing that the Investigators. Effects of an angiotensin-converting-
enzyme inhibitor, ramipril, on cardiovascular events in
findings are compatible with a wide range of high-risk patients. N Engl J Med 2000; 342: 145–153.
effect sizes. 11. Altman DG, Bland JM. Absence of evidence is not
evidence of absence. BMJ 1995; 311: 485.
12. Guyatt GH, Sackett DL, Cook DJ. Users’ guides to the
References medical literature. II. How to use an article about therapy
1. Davies HTO. Interpreting measures of treatment effect. or prevention. B. What were the results and will they help
Hosp Med 1998; 59: 499–501. me in caring for my patients? JAMA 1994; 271: 59–63.

7
Volume 3, number 1

What are
Prescribing information: Tritace®
Presentation: Capsules containing 1.25mg, 2.5mg, 5mg or 10mg ramipril.
Indications: Reducing the risk of myocardial infarction, stroke, cardiovascular death or need for revascularisation procedures in
patients of 55 years or more who have clinical evidence of cardiovascular disease (previous MI, unstable angina or multivessel CABG or
multivessel PTCA), stroke or peripheral vascular disease. Reducing the risk of myocardial infarction, stroke, cardiovascular death or need
for revascularisation procedures in diabetic patients of 55 years or more who have one or more of the following clinical findings:
hypertension (systolic blood pressure >160mmHg or diastolic blood pressure >90mmHg); high total cholesterol (>5.2mmol/l); low HDL
confidence intervals?
(< 0.9mmol/l); current smoker; known microalbuminuria; clinical evidence of previous vascular disease. Mild to moderate
hypertension. Congestive heart failure as adjunctive therapy to diuretics with or without cardiac glycosides. Reduction in mortality in
patients surviving acute MI with clinical evidence of heart failure.
Dosage and administration: Reduction in risk of MI, stroke, cardiovascular death or need for revascularisation procedure:
The initial dose is 2.5mg Tritace o.d.. Depending on tolerability, the dose should be gradually increased. It is recommended that this
dose is doubled after about 1 week of treatment then, after a further 3 weeks, increased to 10mg. The usual maintenance dose is 10mg
Tritace o.d.. Patients stabilised on lower doses of Tritace for other indications where possible should be titrated to 10mg Tritace o.d..
Hypertension: Initial dose 1.25mg titrated up to a maximum of 10mg o.d. according to response. Usual dose 2.5mg or 5mg o.d.. Stop
diuretic therapy 2 - 3 days before starting Tritace and resume later if required. Congestive heart failure: Initial dose 1.25mg o.d.
titrated up to a maximum of 10mg daily according to response. Doses above 2.5mg daily can be given o.d. or b.d.. Post Myocardial
Infarction: Initiate treatment between day 3 and day 10 following AMI. Initially 2.5mg b.d. increasing to 5mg b.d after 2 days.
Assessment of renal function is recommended prior to initiation. Reduced maintenance dose may be required in impaired renal
function. Monitor patients with impaired liver function. In the elderly the dose should be titrated according to need. Not recommended
for children.
Contraindications: Hypersensitivity to ramipril or excipients, history of angioneurotic oedema, haemodynamically relevant renal
artery stenosis, hypotensive or haemodynamically unstable patients, pregnancy, lactation.
Precautions: Do not use in aortic or mitral valve stenosis or outflow obstruction. Assess renal function before and during use, as there
is a risk of impairment of renal function. Use with caution during surgery or anaesthesia. Hyperkalaemia. Do not use in patients using
polyacrylonitrile (AN69) dialysis membranes or during low-density lipoprotein apheresis with dextran sulphate. Agranulocytosis and
bone marrow depression seen rarely with ACE inhibitors as well as a reduction in red cell count, haemoglobin and platelet content.
Symptomatic hypotension may occur after initial dose or increase in dose, especially in salt/volume depleted patients.
Drug interactions: Combination with diuretics, NSAIDs, adrenergic blocking drugs or other antihypertensive agents may potentiate
antihypertensive effect. Risk of hyperkalaemia when used with agents increasing serum potassium. May enhance the effect of
antidiabetic agents. May increase serum lithium concentrations.
Side effects: Dizziness, headache, weakness, disturbed balance, nervousness, restlessness, tremor, sleep disorders, confusion, loss of
appetite, depressed mood, anxiety, paraesthesiae, taste changes, muscle cramps & joint pain, erectile impotence, reduced sexual desire,
fatigue, cough, hypersensitivity reactions; pruritus, rash, shortness of breath, fever, cutaneous and mucosal reactions, Raynauds
phenomenon, gastrointestinal disturbances, jaundice, hepatitis, impaired renal function, angioneurotic oedema, pancreatitis,
eosinophilia and vasculitis. Symptomatic hypotension, myocardial infarction or cerebrovascular accident possibly secondary to severe
hypotension in high-risk patients, chest pain, palpitations, rhythm disturbances, angina pectoris may occur. Use with caution and
closely monitor patients with impaired liver function. Reduced serum sodium levels, elevated blood urea nitrogen and serum creatinine.
Pre-existing proteinuria may deteriorate.
Basic NHS cost: 28 x 1.25mg capsules £5.30; 28 x 2.5mg capsules £7.51; 28 x 5mg capsules £9.55; 28 x 10mg capsules £13.00
Marketing authorisation numbers: 1.25mg 13402/0021, 2.5mg 13402/0022, 5mg 13402/0023, 10mg 13402/0024
Legal category: POM
Marketing authorisation holder: Aventis Pharma, 50 Kings Hill Avenue, Kings Hill, West Malling, Kent, ME19 4AH
Date of preparation: July 2000

This publication, along with the others in the


series, is available on the internet at
www.evidence-based-medicine.co.uk
Published by Hayward Medical
Communications, a division of
Hayward Group plc.
Sponsored by an educational grant from Copyright © 2001 Hayward
AVENTIS Pharma Group plc.
All rights reserved.

TRI1151200
Date of preparation: February 2001
8

You might also like