CAT Summary: Understanding Diagnostic Tests 13
Sensitivity, Specifity, NPV, & PPV
Sensitivity: proportion of people with disease who will have a + result
Specificity: proportion of people without the disease who will have a result
PPV: proportion of people with a + test result who actually have the disease
NPV: proportion of people with a test result who do not have disease
1 sensitivity tells us nothing about whether or not some people without the
disease would also test + and, if so, in what proportion.
2 A test with a high sensitivity is useful for ‘ruling out’ a disease if a person tests
3 High sensitivity helps eliminate false negatives
4 Specificity tells us nothing about whether or not some people with the disease
would also have a negative result and, if so, in what proportion.
5 A test with a high specificity is useful for ‘ruling in’ a disease if a person tests +
6 High specificity helps eliminate false positives
7 Both sensitivity & specificity are of no practical use when it comes to helping the
clinician estimate the probability of disease in individual patients
That is because they are defined on the basis of people with or without a disease,
while real patients present with symptoms rather than a diagnosed disease.
8 Positive & Negative Perdictive Values describe a patient’s probability of having
disease once the results of his or her tests are known.
9 Posttest probability of disease given a negative test is not the same as the NPV
but is the converse (1NPV).
10 When the prevalence increase, PPV increases, & the NPV decreases.
LIKELIHOOD RATIO
> Defined as the ratio between the probability of observing that result in patients with the disease in question, and the probability of that result in patients
without the disease.
> are, clinically, more useful; they provide a summary of how many times more (or less) likely patients with the disease are to have that particular result
than patients without the disease, and they can also be used to calculate the probability of disease for individual patients.
negative
negative
(Sensitivity/1 − specificity) (1 − Sensitivity/Specificity)
notes
1 LR+s greater than 1 mean that a positive test is more likely to occur in people with the disease than in people without the disease.
2 LR+s less than 1 mean that a positive test is less likely to occur in people with the disease compared to people without the disease.
3 LR+s of more than 10 significantly increase the probability of disease (‘rule in’ disease) whilst very low LR+s (below 0.1) virtually rule out the chance
that a person has the disease).
4 LR−s greater than 1 mean that a negative test is more likely to occur in people with the disease than in people without the disease.
5 LR−s less than 1 mean that a negative test is less likely to occur in people with the disease compared to people without the disease.
6 LR−s more than 10 significantly increase the probability of disease (rule in disease) whilst a very low LR− (below 0.1) virtually rule out the chance that
a person has the disease.
The Fagan’s nomogram
> The Fagan’s nomogram is a graphical tool which, in routine clinical practice, allows one to use the results of a diagnostic
test to estimate a patient’s probability of having disease.
> In this nomogram, a straight line drawn from patient’s pretest probability of disease (left axis) through the likelihood ratio
of the test (middle axis) will intersect with the posttest probability of disease (right axis).
> STEPS:
1) Convert pretest probability (prevalence) to pretest odds Pretest odds = prevalence/(1 – prevalence)
2) Multiply pretest odds by likelihood ratio to obtain posttest odds Pretest odds × likelihood ratio = posttest odds
3) Convert posttest odds to posttest probability (predictive value) Posttest probability = posttest odds/(1 + posttest odds)
OR Place a straight edge at the correct prevalence & LR values & read off the posttest probability where the straight edge
crosses the line.
The Receiver Operating Characteristic (ROC) Curve
> is obtained by calculating the sensitivity and specificity of a test at
every possible cutoff point, and plotting sensitivity against 1specificity.
note that sensitivity & specificity are generally inversely related.
> may be used to select optimal cutoff values for a test result, to assess
the diagnostic accuracy of a test, and to compare the usefulness of
different tests.
> One method to determine the optimal cutoff point assumes that the
best cutoff point for balancing the sensitivity & specificity of a test is the
point on the curve closest to the (0, 1) point.
{the minimal value for (1 − sensitivity)2 + (1 − specificity)2}
> The 2nd method is Youden index (J): the maximum vertical distance
between the ROC curve and the diagonal or chance line.
J = maximum {sensitivity + specificity −1}
> The area under the curve (AUC) reflects how good the test is at
distinguishing between patients with disease and those without disease.
> AUC > 0.9 = high accuracy, 0.7–0.9 = moderate accuracy, 0.5–0.7 =
low accuracy, 0.5 = chance result.
> It should be noted that all measures of diagnostic accuracy including
the AUC are statistical estimates and should be reported with confidence
intervals.
CAT Summary: Sample Size Selection
Introduction Disadvantages of Sample Size
# The most important aim of a screening or diagnostic study is, usually to determine how sensitive Calculations
a screening or diagnostic test is in predicting an outcome when both the test and variable for 1 authors can alter their calculations until the
clinical diagnosis are presented as dichotomous data. estimated required sample size more or less
# An important consideration to be made before conducting any screening or diagnostic studies is matches the expected number of participants.
to plan and justify a sufficient sample size. This is to ensure that the results obtained from the 2 Estimating a required sample size also implies a
subsequent analysis will provide the screening or diagnostic test with a desired minimum value for dichotomy between ‘underpowered’, uninformative
both its sensitivity and specificity, together with a sufficient level of power and a sufficientlylow studies and informative studies, while neither the
level of type I error (i.e., its corresponding pvalue). precision of the outcome measure nor the statistical
# In most instances, the minimum sample size will depend on the objectives of the research study. significance of a result is really dichotomous
Sample size calculations in diagnostic accuracy studies
1 decide what type of study is planned, and whether the study is hypothesis driven or not.
2 choose the right sample size calculation.
either consult a statistician to do this, or use a sample size calculator.
3 actual sample size calculation
you need to have an idea of the required power (often 80% or 90%) and of the required alpha (probability of making a Type I error, often is 0.05)
4 depending on the study design (e.g. casecontrol), choose the sample size
casecontrol design:
2 groups (sampling people with and without the disease separately), each group will contain the estimated sample size
e.g. for disease X with prevalence 10% & sample size of 60, each of your 2 groups should include 60 participants
cohort study: the final sample depends on the expected proportion of diseased people in the study sample
e.g. for disease X with prevalence of 10% & sample size of 60, then 60 is equal to 10% of your whole sample,
therefore the whole sample consists of 600 (60x10)
Sample Size Calculation Using PASS Software Review of the Results
The overall rationale of determining the minimum sample size is: By fixing the values of the power of a screening or diagnostic study
1 for a screening study: to detect as many as truepositives as possible, hence and also the type I error, the minimum sample size required for
it shall necessitate a sufficientlyhigh degree of sensitivity but it may not require determining both the sensitivity and specificity of a screening or
a similarly high degree of specificity., diagnostic test will increase when there is a smaller clinicallyimportant
2 for a diagnostic study: to detect as many truepositives and also true difference (in both sensitivity and specificity of a diagnostic test)
negatives at the same time, hence, it shall necessitate a sufficientlyhigh degree between those proposed in null hypothesis and those proposed in
of both sensitivity and specificity. alternative hypothesis.
Discussion
# The concept of null hypothesis is to estimate the values of sensitivity and specificity before the study is conducted.
# The minimum sample size required will increase if either:
1 a lower value of both sensitivity and specificity of a screening or diagnostic test is adopted within the null hypothesis, or
2 there is a smaller difference (in the values of both sensitivity or specificity of a screening or diagnostic test) between those adopted within the null
hypothesis and those adopted within the alternative hypothesis
# The minimum sample size required will depend on the prespecified values of the power of the screening or diagnostic test (which depend on the
research objectives of the study), its corresponding level of type I error (i.e., its pvalue) and the effect size.
# Predetermined sensitivity & specificity:
1 For a Screening study:
> Null Hypothesis: sensitivity should be 50% at least
(to indicate that the probability or chance for an instrument to detect a truepositive is in balance with at least 50.0%.)
> Alternative Hypothesis; sensitivity should be 70% at least
(to indicate that the probability or chance for an instrument to detect a truepositive or a truenegative is at least 70%)
2 For a Diagnostic Study:
> Null Hypothesis: specificity & sensitivity should be 70% at least
(to indicate that the probability or chance for an instrument to detect a truepositive or a truenegative is at least 70%)
> Alternative Hypothesis: specificity & sensitivity should be 80% at least (to indicate that the instrument is fairly good as a diagnostic tool)
Determination of a minimum Sample Size Required for a Screening Study Determination of a minimum Sample Size
# The aim is to determine to what extent a specific newlydeveloped instrument is as Required for a diagnostic Study
sensitive as a screening tool to screen patients for X disease. # The aim is a high value of both sensitivity and
# Note that the minimum sample size required for screening studies will depend on whether specificity for a specific diagnostic instrument
sensitivity or specificity of a screening test is being measured.
# bigger minimum sample size will be required
1 for measuring sensitivity of a screening test when the prevalence of a disease is lower
2 for measuring specificity of a screening test when the prevalence is higher.
Other Considerations
# Note that sample size planning will only provide an estimate because it is sometime difficult to know the exact prevalence of a disease in the population
and also the true performance of a specific screening or diagnostic tool until the research study has been completed.
Therefore, it is acceptable for researchers to adopt the discrete values which are nearest to these estimates obtained from literature (e.g. if the literature
provides prevalence of 4.5%, researchers can adopt the 5% prevalence present in the table for their sample seize measurement)
# The estimated minimum sample size required will range from between 22 until 4860 depending on the prespecified values of the power of both
screening and diagnostic test, their corresponding type I error (i.e., their pvalue), and the effect size.
Researchers are advised not to obtain a very small sample size, such as 22 subjects (Prevalence=90%, Ho=0.5 and Ha=0.8) although its sample size
calculation is still valid.
At the same time, researchers may often be quite reluctant to recruit a large sample of patients because this will be costly and timeconsuming.
# For researchers that face difficulty in estimating a reliable estimate for the effect size, obtaining a sample of minimum 300 subjects is often sufficiently
large to evaluate both sensitivity and specificity of most screening or diagnostic tests.
CAT Summary: Checklist
# The Problem (case senario)
# Results of the Research
# is the Research (results) Valid?
1 was there an independent, and blindly with both the test of interest, as well as a ‘gold’ standard
(control) test?
2 was the diagnostic test evaluated in an appropriate spectrum of patients (patients you might
expect to see in practice)?
3 was the reference standard applied regardless of the diagnostic test result?
4 Did the study contain enough cases to compare the new test and the gold standard test reliably?
Did the authors include a power calculation?
# Is the Research Important? / What are the results?
5 sensitivity, Specificity, LR +, LR, Pretest probability
sensitivity should be high to catch as many cases as possible.
Specificity should be high to rule out as many noncases as possible.
# Can I Apply it to My Patient? / How Relevant are the Results?
6 Is the diagnostic test available, affordable, accurate, & precise in your setting?
7 Can you generate a clinically sensible estimate of your patient's pretest probability?
(It is possible to get a rough idea of how prevalent the condition you are trying to diagnose is in
your patients)
from practice data, from personal experience, from the report itself, or from clinical speculation?
8 Will the resulting posttest probabilities (PPV & NPV) affect your management and help your
patient?
Would the results change management?
Are patients willing to be treated?
Could it move you across a testtreatment threshold?
9 Is the diagnostic test likely to be accurate in your patients?
Would its predictive values be good enough for the prevalence of the condition in your patients?
Positive test results are more likely to be accurate when the condition is more common in people
like your patient
negative test results are more likely to be accurate when the condition is less common in people
like your patient
# Clinical “bottomline”