0% found this document useful (0 votes)
13 views12 pages

Kja 21209

The document provides an overview of the receiver operating characteristic (ROC) curve, a crucial tool in clinical diagnostics for determining disease presence based on test results. It explains the ROC curve's utility in assessing diagnostic test performance, selecting optimal cut-off values, and understanding key concepts such as sensitivity and specificity. The review also distinguishes between parametric and nonparametric ROC curves, highlighting their advantages and disadvantages in clinical practice.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views12 pages

Kja 21209

The document provides an overview of the receiver operating characteristic (ROC) curve, a crucial tool in clinical diagnostics for determining disease presence based on test results. It explains the ROC curve's utility in assessing diagnostic test performance, selecting optimal cut-off values, and understanding key concepts such as sensitivity and specificity. The review also distinguishes between parametric and nonparametric ROC curves, highlighting their advantages and disadvantages in clinical practice.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Receiver operating characteristic curve:

overview and practical use for clinicians


Francis Sahngun Nahm1,2
Department of Anesthesiology and Pain Medicine, 1Seoul National University Bundang
Hospital, Seongnam, 2Seoul National University College of Medicine, Seoul, Korea

Using diagnostic testing to determine the presence or absence of a disease is essential in


Statistical Round clinical practice. In many cases, test results are obtained as continuous values and require a
process of conversion and interpretation and into a dichotomous form to determine the
Korean J Anesthesiol 2022;75(1):25-36 presence of a disease. The primary method used for this process is the receiver operating
https://doi.org/10.4097/kja.21209 characteristic (ROC) curve. The ROC curve is used to assess the overall diagnostic perfor-
pISSN 2005–6419 • eISSN 2005–7563 mance of a test and to compare the performance of two or more diagnostic tests. It is also
used to select an optimal cut-off value for determining the presence or absence of a dis-
ease. Although clinicians who do not have expertise in statistics do not need to understand
both the complex mathematical equation and the analytic process of ROC curves, under-
Received: May 18, 2021 standing the core concepts of the ROC curve analysis is a prerequisite for the proper use
Revised: July 9, 2021 (1st); August 19, 2021 and interpretation of the ROC curve. This review describes the basic concepts for the cor-
(2nd)
rect use and interpretation of the ROC curve, including parametric/nonparametric ROC
Accepted: August 29, 2021
curves, the meaning of the area under the ROC curve (AUC), the partial AUC, methods
for selecting the best cut-off value, and the statistical software to use for ROC curve analy-
Corresponding author:
Francis Sahngun Nahm, M.D., Ph.D. ses.
Department of Anesthesiology and Pain
Medicine, Seoul National University Bundang Keywords: Area under curve; Mathematics; Reference values; Research design; ROC
Hospital, 82 Gumi-ro, 173 Beon-gil, Bundang- curve; Routine diagnostic tests; Statistics.
gu, Seongnam 13620, Korea
Tel: +82-31-787-7499
Fax: +82-31-787-4063
Email: [email protected] Introduction
ORCID: https://orcid.org/0000-0002-5900-7851
Using diagnostic testing to determine the presence or absence of a disease is an essen-
tial process in the medical field. To determine whether a patient is diseased or not, it is
necessary to select the diagnostic method with the best performance be used by compar-
ing various diagnostic tests. In many cases, test results are obtained as continuous values,
which require conversion and interpretation into dichotomous groups to determine the
presence or absence of a disease. At this time, determining the cut-off value (also called
the reference value) to discriminate between normal and abnormal conditions is critical.
The method that is mainly used for this process is the receiver operating characteristic
(ROC) curve. The ROC curve aims to classify a patient’s disease state as either positive or
negative based on test results and to find the optimal cut-off value with the best diagnos-
tic performance. The ROC curve is also used to evaluate the overall diagnostic perfor-
mance of a test and to compare the performance of two or more tests.
The Korean Society of Anesthesiologists, 2022
This is an open-access article distributed under
Although non-statisticians do not need to understand all the complex mathematical
the terms of the Creative Commons Attribution equations and the analytical process associated with ROC curves, understanding the core
Non-Commercial License (http://creativecommons. concepts of the ROC curve analysis is a prerequisite for the correct interpretation and ap-
org/licenses/by-nc/4.0/), which permits unrestrict�
-
ed non-commercial use, distribution, and repro- plication of analysis results. This review describes the basic concepts for the correct use
duction in any medium, provided the original work and interpretation of the ROC curve, including how to draw an ROC curve, the differ-
is properly cited.
ence between parametric and nonparametric ROC curves, the meaning of the area under

Online access in http://ekja.org 25


Francis Sahngun Nahm · Receiver operating characteristic curve

the ROC curve (AUC) and the partial AUC, the methods for se- the non-diseased group. In this situation, if the cut-off value is set
lecting the best cut-off value, and the statistical software for ROC to 60, people with the disease who have a test result < 60 will be
curve analysis. incorrectly classified as not having the disease (false negative).
When a physician lowers the cut-off value to 55 to increase the
Sensitivity, specificity, false positive, and false negative sensitivity of the test, the number of people who will test positive
increases (increased sensitivity), but the number of false positives
To understand the ROC curve, it is first necessary to under- also increases (Fig. 1B).
stand the meaning of sensitivity and specificity, which are used to
evaluate the performance of a diagnostic test. Sensitivity is defined What is the ROC curve?
as the proportion of people who actually have a target disease that
are tested positive, and specificity is the proportion of people who The ROC curve is an analytical method, represented as a graph,
do not have a target disease that are tested negative. FP refers to that is used to evaluate the performance of a binary diagnostic
the proportion of people that do not have a disease but are incor- classification method. The diagnostic test results need to be classi-
rectly tested positive, while FN refers to the proportion of people fied into one of the clearly defined dichotomous categories, such
that have the disease but are incorrectly tested negative (Table 1). as the presence or absence of a disease. However, since many test
The ideal test would have a sensitivity and specificity equal to 1.0; results are presented as continuous or ordinal variables, a refer-
however, this situation is rare in clinical practice since sensitivity ence value (cut-off value) for diagnosis must be set. Whether a
and specificity tend to decrease when either of them increases. disease is present can thus be determined based on the cut-off
As shown in Fig. 1, when a diagnostic test is performed, the value. An ROC curve is used for this process.
group with the disease and the group without the disease cannot The ROC curve was initially developed to determine between a
be completely divided, and overlapping exist. Fig. 1A shows two signal (true positive result) and noise (false positive result) when
hypothetical distributions corresponding to a situation where the analyzing signals on a radar screen during World War II. This
mean value of a test result is 75 in the diseased group and 45 in method, which has been used for signal detection/discrimination,
was later introduced to psychology [1,2] and has since been widely
used in the field of medicine to evaluate the performance of diag-
Table 1. The Decision Matrix nostic methods [3–6]. It has recently also been applied in various
Predicted condition other fields, such as bioinformatics and machine learning [7,8].
Test (+) Test (−) The ROC curve connects the coordinate points using “1 – spec-
True condition Disease (+) a b ificity (false positive rate)” as the x-axis and “sensitivity” as the
Disease (−) c d y-axis for all cut-off values measured from the test results. The
The receiver operating characteristic curve is drawn with the x-axis stricter the criteria for determining a positive result, the more
as 1 - specificity (false positive) and the y-axis as sensitivity. sensitivity
= a / (a + b), specificity = d / (c + d), false negative = b / (a + b), false points on the curve shift downward and to the left (Fig. 2, Point
positive = c / (c + d), and accuracy = (a + d) / (a + b + c + d). A). In contrast, if a loose criterion is applied, the point on the

People People People People


without with without with
disease disease disease disease
(TN) (TP) (TN) (TP)

(FN) (FP) (FN) (FP)

45 60 75 45 55 75
Normal Abnormal A Normal Abnormal B
Fig. 1. Graphical illustrations of two hypothetical distributions for patients with or without disease of interest. The vertical line indicates the cut-
point criterion to determine the presence of the disease. TN: true negative, TP: true positive, FN: false negative, FP: false positive.

26 https://doi.org/10.4097/kja.21209
Korean J Anesthesiol 2022;75(1):25-36

curve moves upward and to the right (Fig. 2, Point B). curves are shown in Fig. 3, and the advantages and disadvantages
The ROC curve has various advantages and disadvantages. of these two methods are summarized in Table 2. The parametric
First, the ROC curve provides a comprehensive visualization for method is also referred to as the binary method. By expanding the
discriminating between normal and abnormal over the entire sample size and connecting countless points, the parametric ROC
range of test results. Second, because the ROC curve shows all the curve forms the shape of a smooth curve [10]. This method esti-
sensitivity and specificity at each cut-off value obtained from the mates the curve using a maximum likelihood estimation when
test results in the graph, the data do not need to be grouped like a the two independent groups with different means and standard
histogram to draw the curve. Third, since the ROC curve is a deviations follow a normal distribution or meet the normality as-
function of sensitivity and specificity, it is not affected by preva- sumption through algebraic conversion or square root transfor-
lence, meaning that samples can be taken regardless of the preva- mation [11,12]. If the two normal distributions obtained from the
lence of a disease in the population [9]. However, the ROC curve two groups have considerable overlap, the ROC curve will be
also has some disadvantages. The cut-off value for distinguishing close to the 45° diagonal, whereas if only small portions of the two
normal from abnormal is not directly displayed on the ROC normal distributions overlap, the ROC curve will be located much
curve and neither is the number of samples. In addition, while the farther from the 45° diagonal.
ROC curve appears more jagged with a smaller sample size, a However, when the ROC curve is obtained using the paramet-
larger sample does not necessarily result in a smoother curve. ric method, an improper ROC curve is obtained if the data does
not meet the normality assumption or within-group variations are
Types of ROC curves not similar (heteroscedasticity). An example of an improper para-
metric ROC curve is shown in Fig. 4. To use a parametric ROC
The types of ROC curves can be primarily divided into non- curve, researchers must therefore check whether the outcome val-
parametric (or empirical) and parametric. Examples of the two ues in the diseased and non-diseased groups follow a normal dis-

1.0 1.0

0.8 0.8

0.6 A 0.6
Sensitivity

Sensitivity

0.4 0.4

0.2 0.2

0.0 0.0
0.2 0.4 0.6 0.8 1.0 0.2 0.4 0.6 0.8 1.0
1 - Specificity 1 - Specificity

Fig. 2. A receiver operating characteristic (ROC) curve connects Fig. 3. The features of the empirical (nonparametric) and binormal
coordinate points with 1 - specificity (= false positive rate) as the x-axis (parametric) receiver operating characteristic (ROC) curves. In
and sensitivity as the y-axis at all cut-off values measured from the contrast to the empirical ROC curve, the binormal ROC curve
test results. When a strict cut-off point (reference) value is applied, the assumes the normal distribution of the data, resulting in a smooth
point on the curve moves downward and to the left (Point A). When a curve. For estimating the binormal ROC curve, the sample mean and
loose cut-off point value is applied, the point moves upward and to the sample standard deviation are calculated from the disease-positive
right (Point B). The 45° diagonal line serves as the reference line, since group and the disease-negative group. The 45° diagonal line serves as
it is the ROC curve of random classification. the reference line, since it is the ROC curve of random classification.

https://doi.org/10.4097/kja.21209 27
Francis Sahngun Nahm · Receiver operating characteristic curve

Table 2. Pros and Cons of the Nonparametric (Empirical) and Parametric Receiver Operating Characteristic Curve Approaches
Nonparametric ROC curve Parametric ROC curve
Pros No need for assumptions about the distribution of data. Shows a smooth curve.
Provides unbiased estimates of sensitivity and specificity. Compares plots at any sensitivity and specificity value.
The plot passes through all points.
Uses all data.
Computation is simple.
Cons Has a jagged or staircase appearance. Actual data are discarded.
Compares plots only at observed values of sensitivity or specificity. Curve does not necessarily go through actual points.
ROC curves and the AUC are possibly biased.
Computation is complex.
ROC: receiver operating characteristic curve, AUC: area under the curve.

1.0 Additionally, a semiparametric ROC curve is sometimes used to


overcome the drawbacks of the nonparametric and parametric
methods. This method has the advantage of presenting a smooth
0.8 curve without requiring assumptions about the distribution of the
diagnostic test results. However, many statistical packages do not in-
clude this method, and it is not widely used in the medical research.
0.6
Sensitivity

How is a ROC curve drawn?


0.4
Consider an example in which a cancer marker is measured
for a total of 10 patients to determine the presence of cancer, and
an empirical ROC curve is drawn (Table 3). If the measured val-
0.2
ue of the cancer marker is the same as or greater than the cut-off
value (reference value), the patient is determined to have cancer,
0.0 whereas if the measured value is less than the reference value,
0.2 0.4 0.6 0.8 1.0 normal, and a 2 × 2 table is thus created. The sensitivity and
1 - Specificity specificity change depending on the applied reference value. If
Fig. 4. A comparison of the empirical (solid line) and parametric the reference value is increased, the specificity increases while the
(dot-dashed line) receiver operating characteristic (ROC) curves sensitivity decreases. For example, if the reference value for de-
drawn from the same data. In contrast to the empirical ROC curve,
termining cancer is ≥ 43.3, the sensitivity and specificity are cal-
an inappropriate parametric ROC curve can be distorted or pass
through the 45° diagonal line if the data are not normally distributed culated as 0.67 and 1.0, respectively (Table 3). To increase the
or heteroscedastic. In this case, the empirical method is recommended sensitivity, the reference value for a cancer diagnosis is lowered.
to overcome this problem. If the reference value is ≥ 29.0, the sensitivity and specificity are
1.0 and 0.43, respectively. In this way, as the reference value is
tribution or a transformation is required to follow a normal distri- gradually increased or decreased, the proportion of positive can-
bution. cer results varies, and each sensitivity and specificity pair can be
To overcome this limitation, a nonparametric ROC curve can calculated for each cut-off value. From these calculated pairs of
be used since this method does not take into account the distribu- sensitivity and specificity, a graph with “1 – specificity” as the x
tion of the data. This is the most commonly used ROC curve coordinate and “sensitivity” as the y coordinate can be created
analysis method (also called the empirical method). For this (Fig. 5). Some researchers draw an ROC curve by expressing the
method, the test results do not require an assumption of normali- x-axis as “specificity” rather than “1 – specificity”. In this case, the
ty. The sensitivity and false positive rates calculated from the 2 × values on the x-axis do not increase from 0 to 1.0, but decrease
2 table based on each cut-off value are simply plotted on the from 1.0 to 0.
graph, resulting in a jagged line rather than a smooth curve.

28 https://doi.org/10.4097/kja.21209
Korean J Anesthesiol 2022;75(1):25-36

Table 3. An Example of Simple Data with Ten Patients for Drawing Receiver Operating Characteristic Curves
Confirmed
Patient Tumor marker (continuous value)
cancer
1 (−) 25.8 25.8 25.8 25.825.8 25.8 25.8 25.8 25.8 25.8 25.8
2 (−) 26.6 26.6 26.6 26.626.6 26.6 26.6 26.6 26.6 26.6 26.6
3 (−) 28.1 28.1 28.1 28.128.1 28.1 28.1 28.1 28.1 28.1 28.1
4 (+) 29.0 29.0 29.0 29.029.0 29.0 29.0 29.0 29.0 29.0 29.0
5 (−) 30.5 30.5 30.5 30.530.5 30.5 30.5 30.5 30.5 30.5 30.5
6 (−) 31.0 31.0 31.0 31.031.0 31.0 31.0 31.0 31.0 31.0 31.0
7 (−) 33.6 33.6 33.6 33.633.6 33.6 33.6 33.6 33.6 33.6 33.6
8 (−) 39.3 39.3 39.3 39.339.3 39.3 39.3 39.3 39.3 39.3 39.3
9 (+) 43.3 43.3 43.3 43.343.3 43.3 43.3 43.3 43.3 43.3 43.3
10 (+) 45.8 45.8 45.8 45.845.8 45.8 45.8 45.8 45.8 45.8 45.8
Tumor marker (binary results)
(+) (−) (+) (−) (+) (−) (+) (−) (+) (−) (+) (−) (+) (−) (+) (−) (+) (−) (+) (−) (+) (−)
Confirmed (+) 3 0 3 0 3 0 3 0 2 1 2 1 2 1 2 1 2 1 1 2 0 3
cancer (−) 7 0 6 1 5 2 4 3 4 3 3 4 2 5 1 6 0 7 0 7 0 7
Sensitivity 1.00 1.00 1.00 1.00 0.67 0.67 0.67 0.67 0.67 0.33 0
Specificity 0.00 0.14 0.29 0.43 0.43 0.57 0.71 0.86 1.00 1.00 1.00
Suppose three patients had biopsy-confirmed cancer diagnoses. The grey-colored values refer to the cases determined to be cancer according to
each cut-off value highlighted in bold. The continuous test results can be transformed into binary categories by comparing each value with the
cut-off (reference) value. As the cut-off value increases, the sensitivity for cancer diagnosis decreases and the specificity increases. At each cut-off
value, one pair of sensitivity and specificity values can be obtained from the 2 × 2 table.

1.0 1.0

0.8 0.8

0.6 0.6
Sensitivity

Sensitivity

0.4 0.4

0.2 0.2

0.0 0.0
0.2 0.4 0.6 0.8 1.0 0.2 0.4 0.6 0.8 1.0
1 - Specificity
A 1 - Specificity
B
Fig. 5. Empirical (A) and parametric (B) receiver operating characteristic (ROC) curves drawn from the data in Table 3. Eleven labeled points
on the empirical ROC curve correspond to each cut-off value to estimate sensitivity and specificity. A gradual increase or decrease of the cut-
off values will change the proportion of disease-positive patients. Depending on the cut-off values, each sensitivity and specificity pair can be
obtained. Using these calculated sensitivity and specificity pairs, a ROC curve can be obtained with “1 – specificity” as the x coordinates and
“sensitivity” as the y coordinates.

The area under the curve (AUC) graph, the higher the accuracy of the test because in the upper left
corner, the sensitivity = 1 and the false positive rate = 0 (specific-
The AUC is widely used to measure the accuracy of diagnostic ity = 1). The ideal ROC curve thus has an AUC = 1.0. However,
tests. The closer the ROC curve is to the upper left corner of the when the coordinates of the x-axis (1 – specificity) and the y-axis

https://doi.org/10.4097/kja.21209 29
Francis Sahngun Nahm · Receiver operating characteristic curve

correspond to 1 : 1 (i.e., true positive rate = false positive rate), a parametric method is preferred even for discrete data, because the
graph is drawn on the 45° diagonal (y = x) of the ROC curve bias in the parametric estimates of the AUC is small enough to be
(AUC = 0.5). Such a situation corresponds to determining the negligible. However, if the collected data are not normally distrib-
presence or absence of disease by an accidental method, such as a uted, a nonparametric method is the correct option. For continu-
coin toss, and has no meaning as a diagnostic tool. Therefore, for ous data, the parametric and nonparametric estimates of the AUC
any diagnostic technique to be meaningful, the AUC must be have very similar values [18]. In general, when the sample size is
greater than 0.5, and in general, it must be greater than 0.8 to be large, the AUC estimate follows a normal distribution. Therefore,
considered acceptable (Table 4) [13]. In addition, when compar- when determining whether there is a statistically significant dif-
ing the performance of two or more diagnostic tests, the ROC ference between the two AUCs (AUC1 vs. AUC2), the test can be
curve with the largest AUC is considered to have a better diagnos- tested using the following Z-statistics. To determine whether an
tic performance. AUC (A1) is significant under the null hypothesis, Z can be calcu-
The AUC is often presented with a 95% CI because the data ob- lated by substituting A2 = 0.5.
tained from the sample are not fixed values but rather influenced
by statistical errors. The 95% CI provides a range of possible val- (1)
ues around the actual value. Therefore, for any test to be statisti- Z = (A1 − A2) / Var(AUC)
cally significant, the lower 95% CI value of the AUC must be >
0.5. Partial AUC (pAUC)
The CI of the AUC can be estimated using the parametric or
nonparametric method. The binormal method proposed by Metz When comparing the AUC of two diagnostic tests, if the AUC
[14] and McClish and Powell [15] is used to estimate the CI of the values are the same, this only means that the overall diagnostic
AUC using the parametric approach. These methods use the performance of the two tests are the same and not necessarily that
maximum likelihood under the assumption of a normal distribu- the ROC curves of the two tests are the same [19]. For example,
tion. Several nonparametric approaches have also been proposed suppose two ROC curves intersect. In this case, even if the AUCs
to estimate the AUC of the empirical ROC curve and its variance. of the two ROC curves are the same, the diagnostic performance
One such approach, the rank-sum test using the Mann-Whitney of test A may be superior in a specific region of the curve, and test
method, approximates the variance based on the exponential dis- B may be superior in another region. In this case, the pAUC can
tribution [16]. However, the disadvantage of the rank-sum test is be used to evaluate the diagnostic performance in a specific re-
that it underestimates the variance when the AUC is close to 0.5 gion (Fig. 6) [11,12].
and overestimates the variance as the AUC approaches 1. To over- As its name suggests, the pAUC is the area below some of the
come this drawback, DeLong et al. [17] proposed a method of ROC curve. It is the region between two points of false positive
minimizing errors in variance estimates using generalized U-sta- rate (FPR), defined as the pAUC between the two FPRs (FPR1 = e1
tistics without considering the normality assumptions used in the and FPR2 = e2), which can be expressed as A (e1 ≤ FPR ≤ e2). For
binormal method, which is provided in many statistical software the entire ROC curve to be designated, e1 = 0, e2 = 1, and e1 = e2
packages. = e is the sensitivity at the point where FPR = e. However, a po-
Nonparametric AUC estimates for empirical ROC curves tend tential problem with the pAUC is that the minimum possible val-
to underestimate the AUC on a discrete rating scale, such as a ue of the pAUC depends on the region along the ROC curve that
5-point scale. Except when the sample size is extremely small, the is selected.
The minimum possible value of the pAUC can be expressed as
1
2 (e2 − e1) (e2 + e1) [15]. However, one issue is that the minimum
Table 4. Interpretation of the Area Under the Curve
pAUC value in the range 0 ≤ FPR ≤ 0.2 is 12 (0.2 − 0) (0.2 + 0) =
Area under the curve (AUC) Interpretation
0.02, whereas in the range 0.8 ≤ FPR ≤ 1.0, the minimum value
0.9 ≤ AUC Excellent
of the pAUC is 12 (1.0 − 0.8)(1.0 + 0.8) = 0.18. Therefore, unlike
0.8 ≤ AUC < 0.9 Good
the AUC, in which the maximum possible value is always 1, the
0.7 ≤ AUC < 0.8 Fair
0.6 ≤ AUC < 0.7 Poor pAUC value depends on the two chosen FPRs. Therefore, the
0.5 ≤ AUC < 0.6 Fail pAUC must be standardized. To do this, the pAUC is divided by
For a diagnostic test to be meaningful, the AUC must be greater than 0.5. the maximum value that the pAUC can have, which is called the
Generally, an AUC ≥ 0.8 is considered acceptable. partial area index [20]. The partial area index can be interpreted

30 https://doi.org/10.4097/kja.21209
Korean J Anesthesiol 2022;75(1):25-36

1.0 cient required here is Pearson’s correlation coefficient when the


test result is measured as a continuous variable and Kendalls’ tau
(τ) when measured as an ordinal variable [21].
0.8
A
Determining the optimal cut-off value
B
0.6 In general, it is crucial to set a cut-off value with an appropriate
Sensitivity

sensitivity and specificity because applying less stringent criteria


to increase sensitivity results in a trade-off in which specificity de-
0.4
creases. Finding the optimal cut-off value is not simply done by
maximizing sensitivity and specificity, but by finding an appro-
priate compromise between them based on various criteria. Sen-
0.2
sitivity is more important than specificity when a disease is high-
pAUC
ly contagious or associated with serious complications, such as
COVID-19. In contrast, specificity is more important than sensi-
0.0
0.2 0.4 e1 0.6 e2 0.8 1.0 tivity when a test to confirm the diagnosis is expensive or highly
1 - Specificity risky. If there is no preference between sensitivity and specificity,
Fig. 6. Schematic diagram of two receiver operating characteristic or if both are equally important, then the most reasonable ap-
(ROC) curves with an equal area under the ROC curve (AUC). proach is to maximize them both. Since the methods introduced
Although the AUC is the same, the features of the ROC curves are not
here are based on various assumptions, the choice of which
identical. Test B shows better performance in the high false-positive
rate range than test A, whereas test A is better in the low false-positive method to use should be judged based on the importance of the
range. In this example, the partial AUC (pAUC) can compare these sensitivity versus the specificity of the test. There are more than
two ROC curves at a specific false positive rate range. 30 methods known to find the optimal cut-off value [22]. Some
of the commonly used methods are introduced below.
as the average sensitivity in the selected FPR interval. In addi-
tion, the maximum pAUC between FPR1 = e1 and FPR2 = e2 is Youden’s J statistic
equal to e2 – e1, which is the width of the region when sensitivity Youden’s J statistic refers to the distance between the 45° diago-
= 1.0. By using the pAUC, it is possible to focus on the region of nal and the ROC curve while moving the 45° diagonal (a straight
the ROC curve appropriate to a specific clinical situation. There- line with a slope of 1) in the coordinate (0, 1) direction (Fig. 7A).
fore, the performance of the diagnostic test can be evaluated in a Youden’s J statistic can be calculated as follows, where the point at
specific FPR interval that is appropriate to the purpose of the which this value is maximized is determined as the optimal cut-
study. off value [23].

The sample size for the ROC curve analysis (2)


J = Se + Sp − 1
To calculate the sample size for the ROC curve analysis, the ex-
pected AUCs to be compared (namely, AUC1 and AUC2, where Euclidean distance
AUC2 = 0.5 for the null hypothesis), the significance level (α), Another method for determining the optimal reference value is
power (1 – β), and the ratio of negative/positive results should be to use the Euclidean distance from the coordinate (0, 1), which is
considered [16]. For example, if there are twice as many negative also called the upper-left (UL) index [24]. For this method, the
results as positive results, the ratio = 2, and if there is the same optimal cut-off value is determined using the basic principle that
number of negative and positive results, the ratio = 1. If two tests the AUC value should be large. Therefore, the distance between
are performed on the same group to evaluate test performance, the coordinates (0, 1) and the ROC curve should be minimized
the two ROC curves are not independent of each other. Therefore, [25,26]. The Euclidean distance is calculated as follows:
two correlation coefficients are additionally needed between the
two diagnostic methods both for cases showing negative results (3)
and those showing positive results [21]. The correlation coeffi- Euclidean distance = (1 − Se)2 + (1 − Sp)2

https://doi.org/10.4097/kja.21209 31
Francis Sahngun Nahm · Receiver operating characteristic curve

(0, 1)
(0, 1)

Youden's J (0, 1) (1 − Sp)2 + (1 − Se)2

1 − Se

(0, Se) 1 − Sp
(1 − Sp, Se)

Se

Se

(1, 0)
(1, 0)
1 − Sp Sp

A 1 − Sp
(1 − Sp, 0)
(1, 0) B C
Fig. 7. Figures illustrating the various methods to select the best cut-off values. (A) Youden’s J statistics, (B) Euclidean distance to the upper-left
corner, and (C) maximum multiplication of sensitivity and specificity.

The point at which this value is minimized is considered the (5)


optimal cut-off value. The Euclidean distance on the ROC curve IU = (|Se − AUC| + |Sp − AUC|)
is shown in Fig. 7B.
IU is a method for finding the point at which the sensitivity and
Accuracy specificity are simultaneously maximized. It is similar to the Eu-
Accuracy refers to the proportion of the cases that are accurate- clidean distance; however, it differs in that it uses the absolute dif-
ly classified, as shown in Table 1. ferences between the AUC value and diagnostic accuracy mea-
surements (sensitivity and specificity). This method does not re-
(4) quire complicated calculations since it only involves checking
True positive number + True negative number whether the sensitivity and specificity at the optimal cut-off value
Total number are sufficiently close to the AUC values. In addition, the IU has
been found to have a better diagnostic performance compared to
This definition assumes that all correctly classified results the other methods in most cases [28].
(whether it is true positive or true negative) are of equal value,
and all misclassified results are equally undesirable. However, Cost approach
this is often not the case. The costs of false-positive and false-nega- The cost approach is a method for finding the optimal cut-off
tive classifications are rarely equivalent; the more significant the value that takes into account the benefits of correct classification
cost difference between false positive and false negative results, or the costs of misclassification. This method can be used when
the more likely that the accuracy distorts the clinical usefulness the costs of true positives (TPs), true negatives (TNs), false posi-
of the test results. Accuracy is highly dependent on the preva- tives (FPs), and false negatives (FNs) of a diagnostic test are
lence of a disease in the sample; therefore, even when the sensi- known. The costs here can be medical or financial and can be
tivity and specificity are low, the accuracy may be high [27]. In considered from a patient and/or social perspective. When deter-
addition, this method has a disadvantage because, as sensitivity mining the cut-off value using the cost approach, there are two
and specificity change, there may be two or more points at which ways; to calculate the cost itself [27], or use the cost index (fm)
this value is maximized. [29]. These are calculated as follows:

Index of union (IU) (6)


IU uses the absolute difference between the diagnostic mea- Cost = CFN (1 − Se) Pr + CFP (1 − Sp) (1 − Pr) + CTP Se Pr
surement and the AUC value to minimize the misclassification + CTN Sp (1 − Pr)
rate, calculated using the following formula [28]: 1 − Pr   CFP − CTN
(
fm = Se −      ×
Pr   CFN − CTP
1 − Sp)( )
32 https://doi.org/10.4097/kja.21209
Korean J Anesthesiol 2022;75(1):25-36

where Pr is prevalence and CFP, CTN, CFN, and CTP refer to the This method is straightforward; however, the drawback is that as
costs of FPs, TNs, FNs, and TPs, respectively. These four costs the Se and Sp change, there may be more than one point at which
should be expressed as a common unit. When the cost index (fm) this value is maximized. When there are two or more points at
is maximized, the average cost is minimized, and this point is which the summed value is maximized, the researcher must de-
considered the optimal cut-off value. cide whether to determine the optimal cut-off value based on the
Another method for determining the optimal cut-off value in sensitivity or the specificity.
terms of cost is to use the misclassification cost term (MCT).
Considering only the prevalence of the disease, the CFP, and the Number needed to misdiagnose (NNM)
CFN, the point at which the MCT is minimized is determined as This method refers to the number of patients required to obtain
the optimal cut-off value [29] and expressed as follows: one misdiagnosis when conducting a diagnostic test. In other
words, if NNM = 10, it means that ten people must be tested to
(7) find one misdiagnosed patient. The higher the NNM, the better
CFN
MCT =   
CFP × Pr (1 − Se) + (1 − Pr)(1 − Sp) the test performance. NNM is calculated as follows, and the point
at which the NNM is maximized can be selected as the optimal
Positive likelihood ratio (LR+) and negative likelihood ratio (LR–) cut-off value [30]:
LR+ is the ratio of true positives to false positives, and LR– is the
ratio of false negatives to true negatives. (11)
1 1
NNM =    =
FN + FP   Pr (1 − Se) + (1 − Pr) (1 − Sp)
(8)
LR+ = TP / FP = Se / (1 − Sp) LR− = FN / TN (1 − Se) / Sp
Statistical program for the ROC curve analysis
Researchers can choose a cut-off value that either maximizes
LR+ or minimizes LR−. Statistical programs used to perform the ROC curve analysis
include various commercial software programs such as IBM
Maximum product of sensitivity and specificity SPSS, MedCalc, Stata, and NCSS and open-source software such
For this method, the point at which the product of Se and Sp is as R. Most statistical analysis software programs provide basic
maximized is considered the optimal cut-off value. ROC analysis functions. However, the functions provided by
each software product are slightly different. IBM SPSS, the most
(9) widely used commercial software, can provide fundamental sta-
Maximum product = max [Se × Sp] tistical analyses for ROC curves, such as plotting ROC curves,
calculating the AUC, and CIs with statistical significance. How-
This can also be represented graphically, as shown in Fig. 7C. A ever, IBM SPSS does not include various functions for optimal cut-
square can be obtained whose vertex is on the line connecting the off values and does not provide a sample size calculation. Stata
unit square’s upper left and lower right corners within the ROC provides a variety of functions for ROC curve analyses, including
curve (Se = Sp line). When this square meets the ROC curve, Se the pAUC, multiple ROC curve comparisons, optimal cut-off val-
× Sp is maximized. ue determination using Youden’s index, and multiple performance
measures. MedCalc, as the name suggests, is a software developed
Maximum sum of sensitivity and specificity specifically for medical research. MedCalc provides a sample size
For this method, the point at which the sum of Se and Sp is estimation for a single diagnostic test and includes various analyti-
maximized is considered the optimal cut-off value. cal techniques to determine the optimal cut-off value but does not
provide a function to calculate the pAUC.
(10) Unlike commercial software packages, the R program is a free,
Maximum sum = max [Se + Sp] open-source software that includes all the functions for ROC
curve analyses using packages such as ROCR [31], pROC [32],
At the point where the summation value is maximized, Youd- and OptimalCutpoints [22]. Among the R packages, the ROCR is
en’s index (Se + Sp – 1) and the difference between the true posi- one of the most comprehensive packages for analyzing ROC
tives (Se) and false positives (1 – Sp) are also maximized [25]. curves and includes functions to calculate the AUC with CIs;

https://doi.org/10.4097/kja.21209 33
Francis Sahngun Nahm · Receiver operating characteristic curve

however, options for selecting the optimal cut-off value are very Summary
limited. The pROC provides more comprehensive and flexible
functions than the ROCR. The pROC can be used to compare the The ROC curve is used to represent the overall performance of
AUC with the pAUC using various methods and it provides CIs a diagnostic test by connecting the coordinate points with “1 –
for sensitivity, specificity, the AUC, and the pAUC. Similar to the specificity” ( = false positive rate) as the x-axis and “sensitivity”
ROCR, the pROC also provides some functions for determining as the y-axis for all cut-off point at which the test results are mea-
the optimal cut-off value, which can be determined using Youd- sured. It is also used to determine the optimal cut-off value for
en’s index and the UL index. The pROC can also be used to calcu- diagnosing a disease. The AUC is a measure of the overall perfor-
late the sample size required for a single diagnostic test or to com- mance of a diagnostic test and can be interpreted as the average
pare two diagnostic tests. OptimalCutpoints is a sophisticated R value of sensitivities for all possible specificities. The AUC has a
package specially developed to determine the optimal cut-off val- value between 0 and 1 but is meaningful as a diagnostic test only
ue. It has the advantage of providing 34 methods for determining when it is > 0.5. The larger the value, the better the overall perfor-
the optimal cut-off value. mance of the test. Since nonparametric estimates of the AUC tend
Although these R packages have a considerable number of to be underestimated with discrete grade scale data, whereas para-
functions, they require good programming knowledge of the R metric estimates of the AUC have a low risk of bias unless the sam-
language. Therefore, for someone who is not an R user, working ple size is very small, it is recommended to use parametric esti-
with a command-based interface may be challenging and mates for discrete grade scale data. When evaluating the diagnostic
time-consuming. Therefore, a web-based tool that combines performance of a test only in some regions of the overall ROC
several R packages has recently been developed to overcome curve, the pAUC should be used in specific FPR regions.
these shortcomings, enabling a more straightforward ROC Youden’s index, Euclidean distance, accuracy, and cost index
analysis. The web tool for the ROC curve analysis based on R, can be used to determine the optimal cut-off value. However, the
which includes easyROC and plotROC [33,34], is a web-based approach should be selected according to the clinical situation
application that uses the R packages plyr, pROC, and Optimal- that the researcher intends to analyze. Various commercial pro-
Cutpoints to perform ROC curve analyses, extending the func- grams and R packages as well as a web tool based on R can be
tions of multiple ROC packages in R so that researchers can used for ROC curve analyses.
perform ROC curve analyses through an easy-to-use interface In conclusion, the ROC curve is a statistical method used to de-
without writing R code. The functions of various statistical pack- termine the diagnostic method and the best cut-off value showing
ages for ROC curve analyses are compared and presented in the best diagnostic performance. The best diagnostic test method
Table 5. and the optimal cut-off value should be determined using the ap-
propriate method.

Table 5. Comparison of the Statistical Packages for Receiver Operating Characteristic Curve Analyses
ROC Confidence Multiple Cut-off Sample Open Web tool User
Statistical packages pAUC
plot interval comparisons values size source access interface
Commercial program IBM SPSS (ver. 25) ○ ○ × × × × × × ○

STATA (ver. 14) ○ ○ ○ ○ ○ × × × ○

MedCalc (ver. 19.4.1) ○ ○ × ○ ○ ○ × × ○

NCSS 2021 ○ ○ × ○ ○ ○ × × ○

Free program OptimalCutpoints ○ ○ × × ○ × ○ × ×


(ver. 1.1-4)
ROCR (ver. 1.0-11) ○ ○ ○ × × × ○ × ×
pROC (ver. 1.17.0.1) ○ ○ ○ ○ ○ ○ ○ × ○

easyROC (ver. 1.3.1) ○ ○ ○ ○ ○ ○ ○ ○ ○

plotROC (ver. 2.2.1) ○ ○ × ○ ○ × ○ ○ ○

This table was adapted and modified from Goksuluk et al. [33]. ROC: receiver operating characteristic, pAUC: partial area under the ROC curve.
○: possible, ×: impossible.

34 https://doi.org/10.4097/kja.21209
Korean J Anesthesiol 2022;75(1):25-36

Acknowledgements 184: 364-72.


12. Zou KH, O’Malley AJ, Mauri L. Receiver-operating characteristic
The author would like to thank Ms. Mihee Park at the Seoul analysis for evaluating diagnostic tests and predictive models.
National University Bundang Hospital for her assistance in edit- Circulation 2007; 115: 654-7.
ing the figures included in this paper. 13. Muller MP, Tomlinson G, Marrie TJ, Tang P, McGeer A, Low
DE, et al. Can routine laboratory tests discriminate between
Funding severe acute respiratory syndrome and other causes of com-
munity-acquired pneumonia? Clin Infect Dis 2005; 40: 1079-
None. 86.
14. Metz CE. Basic principles of ROC analysis. Semin Nucl Med
Conflicts of Interest 1978; 8: 283-98.
15. McClish DK, Powell SH. How well can physicians estimate mor-
No potential conflict of interest relevant to this article was re- tality in a medical intensive care unit? Med Decis Making 1989;
ported. 9: 125-32.
16. Hanley JA, McNeil BJ. The meaning and use of the area under a
References receiver operating characteristic (ROC) curve. Radiology 1982;
143: 29-36.
1. Tanner WP Jr, Swets JA. A decision-making theory of visual de- 17. DeLong ER, DeLong DM, Clarke-Pearson DL. Comparing the
tection. Psychol Rev 1954; 61: 401-9. areas under two or more correlated receiver operating character-
2. Lusted LB. Signal detectability and medical decision-making. istic curves: a nonparametric approach. Biometrics 1988; 44:
Science 1971; 171: 1217-9. 837-45.
3. Joo Y, Cho HR, Kim YU. Evaluation of the cross-sectional area 18. Zhou X-H, McClish DK, Obuchowski NA. Statistical methods in
of acromion process for shoulder impingement syndrome. Ko- diagnostic medicine. New York, John Wiley & Sons. 2002.
rean J Pain 2020; 33: 60-5. 19. Metz CE. ROC methodology in radiologic imaging. Invest Radi-
4. Lee S, Cho HR, Yoo JS, Kim YU. The prognostic value of median ol 1986; 21: 720-33.
nerve thickness in diagnosing carpal tunnel syndrome using 20. Jiang Y, Metz CE, Nishikawa RM. A receiver operating character-
magnetic resonance imaging: a pilot study. Korean J Pain 2020; istic partial area index for highly sensitive diagnostic tests. Ra-
33: 54-9. diology 1996; 201: 745-50.
5. Wang L, Shen J, Das S, Yang H. Diffusion tensor imaging of the 21. Hanley JA, McNeil BJ. A method of comparing the areas under
C1-C3 dorsal root ganglia and greater occipital nerve for cervi- receiver operating characteristic curves derived from the same
cogenic headache. Korean J Pain 2020; 33: 275-83. cases. Radiology 1983; 148: 839-43.
6. Jung SM, Lee E, Park SJ. Validity of bispectral index monitoring 22. López-Ratón M, Rodríguez-Álvarez MX, Cadarso-Suárez C,
during deep sedation in children with spastic cerebral palsy un- Gude-Sampedro F. OptimalCutpoints: An R package for select-
dergoing injection of botulinum toxin. Korean J Anesthesiol ing optimal cutpoints in diagnostic tests. J Stat Softw 2014; 61:
2019; 72: 592-8. 1-36.
7. Sonego P, Kocsor A, Pongor S. ROC analysis: applications to the 23. Youden WJ. Index for rating diagnostic tests. Cancer 1950; 3: 32-
classification of biological sequences and 3D structures. Brief 5.
Bioinform 2008; 9: 198-209. 24. Koyama T, Hamada H, Nishida M, Naess PA, Gaarder C, Saka-
8. Sui Y, Lu K, Fu L. Prediction and analysis of novel key genes IT- moto T. Defining the optimal cut-off values for liver enzymes in
GAX, LAPTM5, SERPINE1 in clear cell renal cell carcinoma diagnosing blunt liver injury. BMC Res Notes 2016; 9: 41.
through bioinformatics analysis. PeerJ 2021; 9: e11272. 25. Akobeng AK. Understanding diagnostic tests 3: Receiver operat-
9. Hajian-Tilaki K. Receiver operating characteristic (ROC) curve ing characteristic curves. Acta Paediatr 2007; 96: 644-7.
analysis for medical diagnostic test evaluation. Caspian J Intern 26. Hobden B, Schwandt ML, Carey M, Lee MR, Farokhnia M,
Med 2013; 4: 627-35. Bouhlal S, et al. The validity of the Montgomery-Asberg Depres-
10. Obuchowski NA. Receiver operating characteristic curves and sion Rating Scale in an inpatient sample with alcohol depen-
their use in radiology. Radiology 2003; 229: 3-8. dence. Alcohol Clin Exp Res 2017; 41: 1220-7.
11. Obuchowski NA. ROC analysis. AJR Am J Roentgenol 2005; 27. Zweig MH, Campbell G. Receiver-operating characteristic

https://doi.org/10.4097/kja.21209 35
Francis Sahngun Nahm · Receiver operating characteristic curve

(ROC) plots: a fundamental evaluation tool in clinical medicine. 31. Sing T, Sander O, Beerenwinkel N, Lengauer T. ROCR: visual-
Clin Chem 1993; 39: 561-77. izing classifier performance in R. Bioinformatics 2005; 21:
28. Unal I. Defining an optimal cut-point value in ROC analysis: an 3940-1.
alternative approach. Comput Math Methods Med 2017; 2017: 32. Robin X, Turck N, Hainard A, Tiberti N, Lisacek F, Sanchez JC,
3762651. et al. pROC: an open-source package for R and S+ to analyze
29. Greiner M. Two-graph receiver operating characteristic (TG- and compare ROC curves. BMC Bioinformatics 2011; 12: 77.
ROC): update version supports optimisation of cut-off values 33. Goksuluk D, Korkmaz S, Zararsiz G, Karaagaoglu AE. easyROC:
that minimise overall misclassification costs. J Immunol Meth- an interactive web-tool for ROC curve analysis using R language
ods 1996; 191: 93-4. environment. R J 2016; 8: 213-30.
30. Habibzadeh F, Yadollahie M. Number needed to misdiagnose: a 34. Sachs MC. plotROC: a tool for plotting ROC curves. J Stat Softw
measure of diagnostic test effectiveness. Epidemiology 2013; 24: 2017; 79: 2.
170.

36 https://doi.org/10.4097/kja.21209

You might also like