Journal of Statistical Software: Thresholdroc: Optimum Threshold Estimation
Journal of Statistical Software: Thresholdroc: Optimum Threshold Estimation
Abstract
We introduce an R package that estimates decision thresholds in diagnostic settings
with a continuous marker and two or three underlying states. The package implements
parametric and non-parametric estimation methods based on minimizing an overall cost
function, as well as confidence interval estimation approaches to account for the sampling
variability of the cut-off. Further features of the package include sample size determination
and estimation of diagnostic accuracy measures. We used randomly generated data and
two real datasets to illustrate the capabilities and characteristics of the package.
Keywords: ROC curve, threshold estimation, cost function, diagnostic tests, R package, boot-
strap.
1. Introduction
In the diagnostic area, it is of interest to predict the state of a subject (usually, “diseased”
or “non-diseased”) using a continuous diagnostic test with a classifying threshold, that is,
a value of the diagnostic marker that classifies subjects into two categories: positive and
negative for the disease under study. However, the diagnostic problem can also include more
than two classification states, such as “non-diseased”, “mild condition” or “severe condition”.
The ability of the diagnostic marker to discriminate between states is usually evaluated with
the area under the receiver operating characteristic (ROC) curve in the two-state setting
(Metz 1978; Pepe 2003) and the volume under the surface (VUS) for the three-state setting
(Nakas, Alonzo, and Yiannoutsos 2010).
2 ThresholdROC: Optimum Threshold Estimation in R
To estimate the threshold that optimally discriminates between states, standard methods
consist of choosing a threshold for a desired false positive/negative rate to be achieved or,
more formally, by maximizing the Youden index, which is the sum, diminished by unity,
of the two fractions showing the proportions correctly classified (Youden 1950; Nakas et al.
2010). Another method based on defining an overall cost function, which includes correct
and incorrect classification rates and the relevant weights associated with each decision, thus
allowing disease prevalence to be also considered, was proposed (Metz 1978; Pepe 2003) and
further developed (Jund, Rabilloud, Wallon, and Ecochard 2005; Skaltsa, Jover, and Carrasco
2010; Skaltsa, Jover, Fuster, and Carrasco 2012). In this methodology, the estimation process
is focused on minimizing the cost function. The methodology takes into consideration the
following: (1) all classification rates should be taken into account; (2) each wrong or right
decision can have a different impact on the final result and (3) disease prevalence can also
play a role in threshold selection or estimation. Zweig and Campbell (1993) warned that
although a diagnostic test can be highly accurate, “its cost or undesirability of false results
may be so high that there is no threshold for which the trade-off between sensitivity and
specificity is acceptable”. Cantor, Sun, Tortolero-Luna, Richards-Kortum, and Follen (1999)
recommended clinicians to weigh their decisions in different fields and provided reasonable
values for different applications.
The cost-minimizing approach provides point estimates for the threshold(s) in a given setting.
Confidence intervals can also be estimated to account for sampling variability, especially
in very overlapping distributions where threshold estimation becomes cumbersome and an
alternative management (e.g., further examination) may be required for those subjects with
marker values close to the estimated threshold (Skaltsa et al. 2012). Further methodological
issues related to sample size requirements have also been addressed for the classic two-state
setting (Skaltsa et al. 2010).
The statistical software currently available for optimum threshold estimation mainly deals
with accuracy. However, there are some programs addressing the costs involved in threshold
estimation, providing either an expected value for each possible threshold, which should be
maximized (e.g., MedRoc; StenStat 2017) or a cost function and its values for each threshold,
which should be minimized (e.g., Analyse-it; Analyse-it Software, Ltd 2017). ROCR (Sing,
Sander, Beerenwinkel, and Lengauer 2005) is a powerful R package (R Core Team 2017) for
ROC visualization that provides tools to plot the cost function when costs for false positives
and false negatives are defined. Another relevant R package in the field is pROC (Robin et al.
2011), which, among many other functions, provides confidence intervals of the sensitivity
and the specificity of a given set of thresholds. However, these packages do not estimate the
threshold confidence interval or address the sample size issue in this context.
Here, we present the R package ThresholdROC (Perez-Jaume, Pallarès, and Skaltsa 2017),
which implements a wide range of techniques for threshold estimation and sample size com-
putation. The package is available from the Comprehensive R Archive Network (CRAN) at
https://CRAN.R-project.org/package=ThresholdROC. In this paper, we briefly present
the methodology behind the ThresholdROC functions and refer to Skaltsa et al. (2010) and
Skaltsa et al. (2012) for further details. We define the cost function and derive analytical
threshold estimators under the binormality/trinormality assumption. We also develop an-
alytical variance estimators and construct confidence intervals for the optimum diagnostic
threshold. Moreover, we address the empirical method, which is an alternative approach
when no distributional assumptions can be made. The optimal sample size ratio of diseased
Journal of Statistical Software 3
to non-diseased subjects may be of interest during study design and it can also be obtained
using a function in the package under the assumption of binormality. Thus, ThresholdROC
can perform a wide range of calculations when continuous measurements are involved and
the true state of the subjects known. The rest of the article is structured as follows. We de-
scribe the methodology for estimation and inference in Section 2. Then, we illustrate how the
package can be used to calculate optimum threshold estimates and their confidence intervals
in two- and three-state settings using randomly generated data in Section 3 and on two real
datasets in Section 4. Finally, a discussion and concluding remarks are given in Section 5.
2. Threshold estimation
In this section, we present the threshold estimation methods implemented by ThresholdROC
for two- and three-state settings, as well as the methodology for estimating sample size in a
binormal setting.
The overall cost function should be minimized (Metz 1978; Pepe 2003). For the two-state
setting, the expression for the cost function is
threshold can also be estimated on the basis of the empirical costs. In this case, each sample
value is used as a threshold and the overall cost calculated. Thus, the optimum threshold is
that with the lowest cost. Parametric estimation can also be applied when the distributions
for both diseased and non-diseased populations are known.
To generalize the approach used for the two-state setting we will follow Skaltsa et al. (2012).
Let k be the number of possible states, X a continuous marker and Tl the thresholds between
the k states, with l = 1, . . . , k − 1. If n is the sample size, ρi the prevalence of the ith state,
Cij the cost of classifying an individual of class i as class j and Fi the distribution function
of the population in the ith class, then the cost function is defined as
k X
k
C=n ρi Cij (Fi (Tj ) − Fi (Tj−1 )) ,
X
i=1 j=1
VAR (T ) = dΣd> ,
where d is the vector of derivatives of T with respect to θ = (µD , µD̄ , σD , σD̄ ), where
µD , µD̄ , σD and σD̄ stand for the means and standard deviations of the diseased and non-
diseased populations, respectively, and Σ is the variance-covariance matrix of θ.
In the three-state setting, variance can be estimated using parametric methods based on
non-linear equations (Skaltsa et al. 2012; Mak 1993) with the expression:
Vii
VAR (Ti ) = ,
A2ii
>
∂2C
where Vii = VAR ∂C
∂Ti and Aii are the diagonal elements of the matrix A = E ∂T 2 T̂
,
T̂ being a root of ∂T .
∂C
Bootstrapping can also be applied to calculate confidence intervals in both two- and three-
state settings (Efron and Tibshirani 1998). There are two ways in which bootstrapping
Journal of Statistical Software 5
can be used. In the first approach, the standard error of the threshold is estimated using
bootstrapping, with the corresponding confidence interval being obtained on the assumption
that the threshold estimators follow a normal distribution. In the second approach, the
bootstrap percentile confidence interval is calculated.
zα/2 is the α/2th quantile of a standard normal distribution and µD , µD̄ , σD and σD̄ represent
the means and standard deviations of the diseased and non-diseased populations, respectively.
Please see Skaltsa et al. (2010) for details on the formula used when the variances for the
diseased and non-diseased populations are different.
Population-based threshold
The thresTH2() function solves the one-variable equation (1) of the density ratio, as detailed
in Section 2, given the population probability distribution functions, parameters and the cost
and prevalence values. The thresTH2() algorithm uses the uniroot() function of the stats
package, which searches a pre-specified interval for a root of a given function. The probability
distribution assumed for the populations (arguments dist1 and dist2, which indicate the
probability distribution assumed for the non-diseased and diseased populations, respectively,
and can be chosen from any two-parameter continuous distribution implemented in R) has
to be specified, as well as their parameters (four parameters in the function that the user
should use to specify the first and the second parameter of both distributions), the disease
prevalence (rho) and the classification costs (costs argument). Default values are specified
for further options available in the function. It should be noted that the classification costs
must be provided in an object of class ‘matrix’ as follows:
!
CTP CTN
.
CFP CFN
Threshold: 1.235043
Parameters used
Disease prevalence: 0.3
Costs (Ctp, Cfp, Ctn, Cfn): 0 1 0 2.333333
R: 1
We should remark that we used the default cost matrix here, that is, a combination of costs
that leads to R = 1, which is equivalent to using the Youden index method to obtain the
optimum threshold (Skaltsa et al. 2010). As we can see in the output provided by thresTH2(),
the optimum threshold for the example was 1.24. Disease prevalence, costs and R values used
in the thresTH2() computations are also reported.
• method = "equal": Assumes binormality and equal variances for non-diseased and
diseased populations. This is the default value.
The user can also choose the method for calculating the confidence interval corresponding to
the threshold estimate using the argument ci.method. The choices currently available are:
• ci.method = "delta": Delta method is used to estimate the threshold standard error
assuming an underlying binormal model. Thus, this option can only be used when
method is "equal" or "unequal". This is the default value.
For further details on these methods please see Skaltsa et al. (2010).
Package ThresholdROC also includes a function that evaluates the second derivative of the
cost function at the estimated threshold (secondDer2()), enabling the assessment of whether
the estimated threshold leads to a minimum in the cost function. A value close to zero would
imply that the minimum of the cost function is found in a plateau of the cost function and it
would be advisable to revise the cost assignments.
To illustrate how thres2() works, we will use two random samples of size 100 from two
different normal distributions. Data from the non-diseased sample are stored in the vector
k1, whereas those from the diseased population are stored in k2.
R> set.seed(1234)
R> n <- 100
R> k1 <- rnorm(n, 0, 1)
R> k2 <- rnorm(n, 2, 1)
If we assume the disease prevalence to be 0.2 and a binormal setting with equal variances,
the optimum threshold and its corresponding confidence interval can be calculated as follows:
0.4
D
Thres+CI
0.3
Density
0.2
0.1
0.0
−2 0 2 4 6
Figure 1: Estimates of the probability density functions for non-diseased and diseased popu-
lations, respectively. Also the threshold estimate and its 95% confidence interval are depicted.
Estimate:
Threshold: 0.9422407
Parameters used:
Disease prevalence: 0.2
Costs (Ctp, Cfp, Ctn, Cfn): 0 1 0 4
R: 1
Method: equal
Significance Level: 0.05
The threshold estimate and its confidence interval (and the method used to compute it be-
tween brackets) are provided in the output as are the disease prevalence, costs, the R term,
estimation method and significance level. Moreover, we can apply the plot() method to
the ‘thres2’ object returned. This method produces a plot that allows visual examination
of the problem: estimates of the probability density functions for both samples, as well as
vertical lines representing the threshold and its confidence interval. The plot() method calls
the density() function of the stats package to compute the density curves, and its default
options can be modified with further arguments in the plot() function (Figure 1).
R> plot(thr2, col = c(1, 2, 4), lwd = c(2, 2, 1), leg.pos = "topright")
Now we can check whether the threshold estimate is a minimum of the cost function:
Journal of Statistical Software 9
1.0
Sensitivity
cost(t)
0.5
80
60
0.0
−2 0 2 4 6
1.0 0.8 0.6 0.4 0.2 0.0
t
Specificity
Figure 2: Cost function and ROC curve for the two-state example data.
R> round(secondDer2(thr2), 2)
[1] 74.24
The value of the second derivative at the threshold estimate is positive and quite far from zero.
Therefore, we can conclude that the threshold estimate is a reliable optimum. The validity
of the estimate can be confirmed by plotting the cost function using the plotCostROC()
function. Notice that we additionally obtain the corresponding ROC curve (Figure 2).
Sample size
Package ThresholdROC contains the SS() function to estimate the optimum sample size
ratio (diseased to non-diseased) and the sample size required to achieve a specified confidence
interval width and confidence level, assuming a binormal model with either equal or unequal
variances. To demonstrate how SS() works, we will use the following example in which
the non-diseased population follows a normal distribution with a mean of 0 and a standard
deviation of 1 and the diseased population follows a normal distribution with a mean of 2 and
the same standard deviation, with the disease prevalence being 0.3. Default costs are used.
The following code calculates the sample size needed to achieve a desired confidence interval
width of 0.5 and a 95% confidence level (default option):
Optimum SS Ratio: 1
Parameters used
Significance Level: 0.05
CI width: 0.5
Disease prevalence: 0.3
Costs (Ctp, Cfp, Ctn, Cfn): 0 1 0 2.333333
R: 1
The output shows that the optimum ratio is 1:1, i.e., an equal number of diseased and non-
diseased subjects are needed. The minimum sample size required to achieve the desired width
of the confidence interval is 31 diseased and 31 non-diseased subjects.
Consider now that the standard deviation of the diseased population is set at 3:
Parameters used
Significance Level: 0.05
CI width: 0.5
Disease prevalence: 0.3
Costs (Ctp, Cfp, Ctn, Cfn): 0 1 0 2.333333
R: 1
The optimum ratio is now around 0.41; thus, 41 diseased individuals are needed for every 100
non-diseased subjects. Hence, the optimum sample size is 153 subjects, 45 diseased and 108
non-diseased individuals.
to this setting are: thresTH3(), which computes the optimum thresholds based on the dis-
tributions assumed for the three states; thres3(), which calculates threshold estimates and
their confidence intervals when sample measurements for each population are available; and
secondDer3(), which computes the second derivative of the cost function to validate the
estimates. Functions providing plots related to the thresholds and their confidence intervals
are also included in package ThresholdROC.
Population-based threshold
Similar to thresTH2(), thresTH3() estimates the theoretical optimum thresholds for specific
distribution parameters, decision costs and prevalences in a three-state setting. The equations
to be solved to find the optimum thresholds in this setting are given in (2). As before, this
is done using the function uniroot(). The arguments in this function are similar to those in
thresTH2(), although here rho must be a 3-dimensional vector of prevalences (indicating the
prevalence of each underlying state) and costs should be a 3 × 3 matrix object as follows:
C11 C12 C13
C21 C22 C23 ,
C31 C32 C33
where Cij is the cost of classifying an individual of class i as class j, for i, j = 1, 2, 3. The
arguments dist1, dist2 and dist3 are used to specify the distribution assumed for each
population.
To give an example of how this function works, we will consider the following three popu-
lations: a standard normal distribution; a lognormal distribution with a mean of 1 and a
standard deviation of 0.5 on the log scale; and a lognormal distribution with a mean of 2
and a standard deviation of 0.5 on the log scale. The prevalence of each state is assumed to
be 1/3 and the default costs, which lead to the same results as the Youden’s method for the
three-state setting (Skaltsa et al. 2012), will be used.
Threshold 1: 1.235043
Threshold 2: 4.481689
Parameters used
Prevalences: 0.3333333 0.3333333 0.3333333
Costs
C11,C12,C13: 0 1 1
C21,C22,C23: 1 0 1
C31,C32,C33: 1 1 0
As we can see from the results, the threshold estimates are 1.24 and 4.48. The object returned
by the function is of class ‘thresTH3’, which, in addition to the threshold estimates, also
contains information on the parameters used.
12 ThresholdROC: Optimum Threshold Estimation in R
The object returned by function thres3() is of class ‘thres3’ and contains the results about
the threshold estimates, their confidence intervals and further information.
In this setting, as in the two-state setting, package ThresholdROC also contains the function
secondDer3(), which calculates the second partial derivatives of the cost function to assess
if the threshold estimates lead to a minimum in the cost function (when the derivatives are
positive) or if such a minimum does not exist (when the derivatives are close to zero).
To illustrate the usage of function thres3(), we will use three random samples of size 100:
a lognormal distribution with a mean of 0 and a standard deviation of 1 on the log scale,
and two normal distributions with means of 3 and 5, respectively, and both with a standard
deviation of 1. Prevalences are assumed to be 13 and default costs are used.
R> set.seed(1234)
R> n <- 100
R> k1 <- rlnorm(n)
R> k2 <- rnorm(n, 3, 1)
R> k3 <- rnorm(n, 5, 1)
R> rho <- c(1/3, 1/3, 1/3)
R> (thr3 <- thres3(k1, k2, k3, rho, dist1 = "lnorm", dist2 = "norm",
+ dist3 = "norm", ci.method = "boot"))
Estimate:
Threshold 1: 1.750509
Journal of Statistical Software 13
Threshold 2: 4.102581
Parameters used:
Prevalences: 0.3333333 0.3333333 0.3333333
Costs
C11,C12,C13: 0 1 1
C21,C22,C23: 1 0 1
C31,C32,C33: 1 1 0
Confidence Level: 0.05
Distribution assumed for the first sample: lnorm(-0.16, 1)
Distribution assumed for the second sample: norm(3.04, 1.03)
Distribution assumed for the third sample: norm(5.15, 0.96)
The threshold estimates and their confidence intervals are provided in the output. As boot-
strapping was used, two confidence intervals for each threshold are generated, one based on
the normal distribution and the other on percentiles. The output of this function also displays
information about the other parameters used. Applying the method plot() to the object
returned by this function, we obtain a graph showing the estimations of the three probability
density functions and vertical lines representing the threshold estimates and their confidence
intervals (Figure 3).
R> round(secondDer3(thr3), 2)
The values obtained are positive but quite close to zero. We can also plot the contribution of
each threshold to the cost function (Figure 4).
As we can see in Figure 4, both thresholds lead to a minimum in the cost function. Further-
more, the cost functions do not show any plateau, indicating that these minimums can be
considered reliable optima.
14 ThresholdROC: Optimum Threshold Estimation in R
1st sample
2nd sample
0.6
3rd sample
Thres+CI
0.5
0.4
Density
0.3
0.2
0.1
0.0
0 5 10
Figure 3: Estimates of the probability density functions of the three populations. Also the
threshold estimates and their confidence intervals are depicted.
0.00
0.00
Cost(thres1)
Cost(thres2)
−0.10
−0.10
−0.20
−0.20
0 2 4 6 8 10 12 0 2 4 6 8 10 12
thres1 thres2
Figure 4: Cost function with respect to both thresholds in the three-state example.
4. Case examples
In order to illustrate the techniques described in the previous sections and the use of the
respective R functions, we applied package ThresholdROC to two real datasets, one for each
diagnostic setting. The datasets were both analyzed in Skaltsa et al. (2010, 2012), and we
present them here for illustration purposes. Both datasets are available in the ThresholdROC
package.
Journal of Statistical Software 15
Non−diseased
Diseased
0.005
Threshold
Boot−perc CI
Boot−norm CI
0.004
0.003
Density
0.002
0.001
0.000
Figure 5: Alzheimer’s disease data: Estimates of the probability density functions for tau
protein measurements in non-diseased and diseased groups. Also the threshold estimate and
its confidence intervals calculated by bootstrapping are depicted.
16 ThresholdROC: Optimum Threshold Estimation in R
1.0
0.8
60
Sensitivity
0.6
Cost
40
0.4
20
0.2
0.0
●
Threshold 1−Specificity
Figure 6: Alzheimer’s disease data: Empirical cost function and ROC curve.
threshold estimate (shown as a red dot on the plot) leads to a minimum in the cost function.
However, the function is noticeably flat around this point, implying that any value around the
estimate, of 384.45, can be considered a plausible threshold. Regarding the empirical ROC
curve, we must point out that our threshold estimate did not lead to an optimal combination
of specificity and sensitivity because the choice of costs did not lead to the same results as
those obtained with Youden’s method.
Costs
Prevalences Correct classification Incorrect classification
ρ1 = 0.24 C11 = 0 C12 = 2 C13 = 5 C23 = 1
ρ2 = 0.58 C22 = 0 C21 = 1 C31 = 4 C32 = 1
ρ3 = 0.18 C33 = 0
Table 1: The prevalence and cost values for the chemotherapy response dataset. 1 denotes
stable subjects, 2 the partial responders and 3 the complete responders.
Stable
Partial r.
0.025
Complete r.
Thresholds
CIs
0.020
0.015
Density
0.010
0.005
0.000
0 50 100
Figure 7: Chemotherapy response data: SUV difference densities for patients who remained
stable, partially responded or completely responded to treatment. Also the threshold esti-
mates and their confidence intervals are depicted.
−0.2 0.0
0.6
0.4
Cost(thres1)
Cost(thres2)
0.2
−0.6
0.0
−1.0
−0.2
● ●
20 40 60 80 100 20 40 60 80 100
thres1 thres2
Figure 8: Chemotherapy response data: Cost function with respect to both thresholds.
18 ThresholdROC: Optimum Threshold Estimation in R
We applied Shapiro-Wilk’s test to the measurements from each population to assess the
normality of the data, obtaining p = 0.82 for the group with stable tumor, p = 0.49 for
the partial responders, and p = 0.24 for the complete responders. Thus, we could assume a
situation of trinormality. Under this assumption, using thres3(), the threshold estimates and
their 95% confidence intervals were 49.41 (43.95, 54.87) and 88.80 (76.47, 101.13). Confidence
intervals were estimated based on the parametric method. A representation of the results is
shown in Figure 7 (obtained with the plot method for ‘thres3’ objects). Evaluating the
second derivatives of the cost function in the threshold estimates through secondDer3() we
obtained positive values (0.044 and 0.017), confirming that the estimates lead to a minimum
in the cost function. The cost function corresponding to both estimates was plotted with
plotCostROC() (Figure 8) graphically confirming that they lead to a minimum.
Using VUS() from DiagTest3Grp package (Luo and Xiong 2012), we calculated the volume
under surface (VUS) for the biomarker SUV to be 0.72 (95% confidence interval, [0.57, 0.88]),
thus, underlining the highly discriminatory capacity of the SUV.
5. Conclusions
The ThresholdROC package, which is publicly available from CRAN at https://CRAN.
R-project.org/package=ThresholdROC, contains a set of functions intended to provide di-
rect calculations of the optimum thresholds for continuous diagnostic tests using the methods
described briefly in this article and more extensively in Skaltsa et al. (2010, 2012). Here,
we illustrate the capabilities of package ThresholdROC in estimating optimum thresholds
based on minimizing an overall cost function in a two- and three-state settings. Package
ThresholdROC can also be used to calculate population-based thresholds, point estimates
and confidence intervals for both two- and three-state settings. Moreover, it provides graphi-
cal tools related to the threshold estimates, allowing a deeper understanding of both the data
and the results obtained. Package ThresholdROC also contains a function that estimates
optimal sample sizes.
In addition to estimating optimum thresholds and sample sizes, package ThresholdROC also
includes the function diagnostic(), which calculates common measures of the accuracy
of diagnostic tests involving 2 × 2 contingency tables of classification results (usually, test
outcome versus status tables). Specifically, it calculates the following statistical measures:
sensitivity, specificity, positive and negative predictive value, positive and negative likelihood
ratio, odds ratio, Youden’s index, accuracy, error rate and appropriate confidence intervals
for each index (Zhou, Obuchowski, and McClish 2002). This can be useful in a two-state
setting when assessing the validity of a dichotomic test based on categorizing a continuous
marker using a threshold estimate.
Acknowledgments
The authors would like to thank Dr. D. Fuster for the chemotherapy response dataset used
to illustrate the ThresholdROC functions for a three-state setting. We also thank Toffa
Evans from Language Services (University of Barcelona), who improved the English text.
We are grateful to the reviewers for their valuable comments, which have led to substantial
improvements in this article.
Journal of Statistical Software 19
References
Analyse-it Software, Ltd (2017). Analyse-it 4.80: Your New Go-To Statistics Package. URL
https://www.analyse-it.com/.
Kapaki E, Paraskevas GP, Zalonis I, Zournas C (2003). “CSF Tau Protein and β-Amyloid
(1-42) in Alzheimer’s Disease Diagnosis: Discrimination from Normal Ageing and the Other
Dementias in the Greek Population.” European Journal of Neurology, 10(2), 119–128. doi:
10.1046/j.1468-1331.2003.00562.x.
Luo J, Xiong C (2012). “DiagTest3Grp: An R Package for Analyzing Diagnostic Tests with
Three Ordinal Groups.” Journal of Statistical Software, 51(3), 1–24. doi:10.18637/jss.
v051.i03.
Mak TK (1993). “Solving Non-Linear Estimation Equations.” Journal of the Royal Statistical
Society B, 55(4), 945–955.
Metz C (1978). “Basic Principles of ROC Analysis.” Seminars in Nuclear Medicine, 8(4),
283–298. doi:10.1016/s0001-2998(78)80014-2.
Nakas CT, Alonzo A, Yiannoutsos CT (2010). “Accuracy and Cut-Off Point Selection in
Three-Class Classification Problems Using a Generalization of the Youden Index.” Statistics
in Medicine, 29(28), 2946–2955. doi:10.1002/sim.4044.
Pepe MS (2003). The Statistical Evaluation of Medical Tests for Classification and Prediction.
Oxford University Press.
R Core Team (2017). R: A Language and Environment for Statistical Computing. R Founda-
tion for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.
Robin X, Turck N, Hainard A, Tiberti N, Lisacek F, Sanchez JC, Müller M (2011). “pROC:
An Open-Source Package for R and S+ to Analyze and Compare ROC Curves.” BMC
Bioinformatics, 12(77). doi:10.1186/1471-2105-12-77.
StenStat (2017). MedRoc 2.0: Software for ROC Analysis of Biomedical Data. URL https:
//stenstat.com/MedRoc/MedRoc.htm.
Venables WN, Ripley BD (2002). Modern Applied Statistics with S. 4th edition. Springer-
Verlag, New York. doi:10.1007/978-0-387-21706-2.
Youden WJ (1950). “Index for Rating Diagnostic Tests.” Cancer, 3(1), 32–35. doi:10.1002/
1097-0142(1950)3:1<32::aid-cncr2820030106>3.0.co;2-3.
Zhou XH, Obuchowski NA, McClish DK (2002). Statistical Methods in Diagnostic Medicine.
John Wiley & Sons. doi:10.1002/9780470317082.
Zweig MH, Campbell G (1993). “Receiver Operating Characteristics (ROC) Plots: A Fun-
damental Tool in Clinical Medicine.” Clinical Chemistry, 39(4), 561–577.
Affiliation:
Sara Perez-Jaume, Konstantina Skaltsa, Josep L. Carrasco
Biostatistics. Department of Basic Clinical Practice
School of Medicine
University of Barcelona
Journal of Statistical Software 21
Natàlia Pallarès
Statistics Advisory Service
Institute of Biomedical Research of Bellvitge (IDIBELL)
Gran Via de l’Hospitalet 199
08908 Hospitalet de Llobregat, Barcelona, Spain
E-mail: [email protected]