BIOSTATISTICS FOR CLINICIANS
Receiver Operating Characteristic Curve in Diagnostic
Test Assessment
Jayawant N. Mandrekar, PhD
and without the disease/condition, the patient with the dis-
Abstract: The performance of a diagnostic test in the case of a binary
predictor can be evaluated using the measures of sensitivity and spec-
ease/condition has a result indicating greater suspicion.2,3
ificity. However, in many instances, we encounter predictors that are
As a simple illustration, Table 1 gives the ratings of
measured on a continuous or ordinal scale. In such cases, it is desirable
images obtained from 109 subjects by a radiologist.2,3 Mul-
to assess performance of a diagnostic test over the range of possible
tiple cutpoints are possible for classifying a patient as normal
cutpoints for the predictor variable. This is achieved by a receiver
or abnormal based on the image ratings. Suppose that ratings
operating characteristic (ROC) curve that includes all the possible
of 4 or above indicate, for instance, that the test is positive
decision thresholds from a diagnostic test result. In this brief report, we
(abnormal), then the sensitivity and specificity would be 0.86
discuss the salient features of the ROC curve, as well as discuss and
(44/51) and 0.78 (45/58), respectively. In contrast, if the
interpret the area under the ROC curve, and its utility in comparing two
ratings of 3 or above were to be considered as positive, then
different tests or predictor variables of interest.
the sensitivity and specificity are 0.90 (46/51) and 0.67
(39/58), respectively. This illustrates that both sensitivity and
Key Words: Sensitivity, Specificity, ROC, AUC. specificity are specific to the selected decision threshold.
(J Thorac Oncol. 2010;5: 1315–1316)
Moreover, the designation of a cutpoint to classify the test
results as positive or negative is relatively arbitrary.
An ROC curve, on the other hand, does not require the
selection of a particular cutpoint. See Figure 1 for the ROC
I n a previous article, we discussed the measures of sensitivity
and specificity that rely on a single cutpoint to classify a test
result as positive or negative.1 In the event of a continuous or
curve for the data presented in Table 1. An ROC curve
essentially has two components, the empirical ROC curve
that is obtained by joining the points represented by the
ordinal predictor, there are often multiple such cutpoints. Al- sensitivity and 1 ⫺ specificity for the different cutpoints and
though sensitivity and specificity can be computed treating each the chance diagonal represented by the 45-degree line drawn
value of the predictor as a possible cutpoint, a receiver operating through the coordinates (0,0) and (1,1). If the test results
characteristic (ROC) curve that includes all the possible decision diagnosed patients as positive or negative for the disease/
thresholds from a diagnostic test result offers a more compre- condition by pure chance, then the ROC curve will fall on the
hensive assessment. In this review, we will introduce the salient diagonal line. Sometimes a fitted (smooth) ROC curve based
features of an ROC curve, discuss the measure of area under the
on a statistical model can also be plotted in addition to the
ROC curve (AUC), and introduce the methods for the compar-
empirical ROC curve.
ison of ROC curves. An overall ROC curve is most useful in the early stages
of evaluation of a new diagnostic test. Once the diagnostic
ROC CURVE ability of a test is established, only a portion of the ROC
Simply defined, an ROC curve is a plot of the sensi- curve is usually of interest, for example, only regions with
tivity versus 1 ⫺ specificity of a diagnostic test. The different high specificity and not the average specificity over all
points on the curve correspond to the different cutpoints used sensitivity values. Similar to sensitivity and specificity, ROC
to determine whether the test results are positive. An ROC curves are invariant to the prevalence of a disease but depen-
curve can be considered as the average value of the sensitivity dent on the patient characteristics and the disease spectrum.
for a test over all possible values of specificity or vice versa. An ROC curve does not depend on the scale of the test results
A more general interpretation is that given the test results, the and can be used to provide a visual comparison of two or
probability that for a randomly selected pair of patients with more test results on a common scale. The latter is not possible
with sensitivity and specificity measures because a change in
Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota. the cutpoint to classify the test results as positive or negative
Disclosure: The author declares no conflicts of interest. could affect the two tests differently.4
Address for correspondence: Jayawant N. Mandrekar, PhD, Department of
Health Sciences Research, Mayo Clinic, 200 1st Street SW, Rochester,
MN 55905. E-mail:
[email protected]Copyright © 2010 by the International Association for the Study of Lung
AREA UNDER THE ROC CURVE
Cancer AUC is an effective way to summarize the overall diag-
ISSN: 1556-0864/10/0509-1315 nostic accuracy of the test. It takes values from 0 to 1, where a
Journal of Thoracic Oncology • Volume 5, Number 9, September 2010 1315
Mandrekar Journal of Thoracic Oncology • Volume 5, Number 9, September 2010
For the data in Table 1, the AUC is 0.89. This suggests
TABLE 1. True Disease Status by Image Ratings
an 89% chance that the radiologist reading the image will
Image Ratings correctly distinguish a normal from an abnormal patient
True 1ⴝ 2ⴝ 4ⴝ 5ⴝ based on the ordering of the image ratings. However, in the
Disease Definitely Probably 3 ⴝ Probably Definitely event of a tied rating, the assumption is that the radiologist
Status Normal Normal Unsure Abnormal Abnormal Total will randomly assign one patient as normal and the other as
Normal 33 6 6 11 2 58 abnormal. A formal hypothesis test of H0: AUC ⫽ 0.5 versus
Abnormal 3 2 2 11 33 51 H1: AUC ⫽ 0.5 for this example yields a test statistic of 12.2,
Total 36 8 8 22 35 109 with a p value ⬍0.001, indicating that this test has excellent
discriminating ability.5
COMPARING TWO OR MORE ROC CURVES
ROC curves are useful for comparing the diagnostic
ability of two or more screening tests or for assessing the
predictive ability of two or more biomarkers for the same
disease. In general, the test with the higher AUC may be
considered better. However, in cases where specific values of
sensitivity and specificity are only clinically relevant for the
comparison, then partial AUCs are compared.
ROC curves generated using data from patients where
each patient is subjected to two (or more) different diagnostic
tests of interest are considered as correlated ROC curves.
ROC curves generated using data from different groups of
patients where patients within each group is subjected to two
different diagnostic tests are referred as uncorrelated ROC
curves. The comparison of two uncorrelated ROC curves is
relatively simple and is based on a form of a Z statistic that
uses the difference in the area under the two curves and the
SD of each AUC. In the case of correlated ROC curves, we
refer the readers to a nonparametric approach proposed by
DeLong et al.7
SUMMARY
FIGURE 1. The receiver operating characteristic curve for
the data in Table 1. Studies designed to measure the performance of diag-
nostic tests are important for patient care and health care
costs. ROC curves are a useful tool in the assessment of the
value of 0 indicates a perfectly inaccurate test and a value of 1 performance of a diagnostic test over the range of possible
reflects a perfectly accurate test. AUC can be computed using values of a predictor variable. The area under an ROC curve
the trapezoidal rule.3 In general, an AUC of 0.5 suggests no provides a measure of discrimination and allows investigators
discrimination (i.e., ability to diagnose patients with and without to compare the performance of two or more diagnostic tests.
the disease or condition based on the test), 0.7 to 0.8 is consid-
ered acceptable, 0.8 to 0.9 is considered excellent, and more than REFERENCES
0.9 is considered outstanding.5 1. Mandrekar JN. Simple statistical measures for diagnostic accuracy
assessment. J Thorac Oncol 2010;5:763–764.
A value of 0.5 for AUC indicates that the ROC curve 2. Hanley JA, McNeil BJ. The meaning and use of the area under a receiver
will fall on the diagonal (i.e., 45-degree line) and hence operating characteristic (ROC) curve. Radiology 1982;143:29 –36.
suggests that the diagnostic test has no discriminatory ability. 3. Rosner B. Fundamentals of Biostatistics, 6th Ed. Chapter 3. Belmont,
ROC curves above this diagonal line are considered to have CA: Duxbury, 2005. Pp. 64 – 66.
reasonable discriminating ability to diagnose patients with 4. Turner DA. An intuitive approach to receiver operating characteristic
curve analysis. J Nucl Med. 1978;19:213–220.
and without the disease/condition. It is therefore natural to do 5. Hosmer DW, Lemeshow S. Applied Logistic Regression, 2nd Ed.
a hypothesis test to evaluate whether the AUC differs signif- Chapter 5. New York, NY: John Wiley and Sons, 2000. Pp. 160 –164.
icantly from 0.5. Specifically, the null and alternate hypoth- 6. Zhou XH, Obuchowski NA, Obuchowski DM. Statistical Methods in
eses are defined as H0: AUC ⫽ 0.5 versus H1: AUC ⫽ 0.5. Diagnostic Medicine. Chapter 2. New York, NY: John Wiley and Sons,
2002. Pp. 27–33.
This test statistic given by 关AÛC ⫺ 0.5冫SE共AÛC兲兴 is ap- 7. DeLong ER, DeLong DM, Clarke-Pearson DL. Comparing the areas
proximately normally distributed and has favorable statistical under two or more correlated receiver operating characteristic curves: a
properties.6 nonparametric approach. Biometrics. 1988;44:837– 845.
1316 Copyright © 2010 by the International Association for the Study of Lung Cancer