Tests For Two ROC Curves: PASS Sample Size Software
Tests For Two ROC Curves: PASS Sample Size Software
com
Chapter 265
Technical Details
In the following, we suppose that we have two groups of patients, those with a condition of interest (the disease)
and those without it. A patient’s classification may be known from extensive diagnosis or based on the value of
another diagnostic test. The diagnostic tests of interest are performed on each patient and the resulting test values
are recorded. At each specified cutoff value of the criterion variable, the true positive rate (TPR) and the false
positive rate (FPR) are calculated. An ROC curve is generating by plotting TPR versus FPR. The plot allows the
consequences of using various cutoff values to be evaluated. The area under the ROC curve, either for the whole
or partial range, is often used as a summary measure of the accuracy of the test.
It should be noted that TPR is similar to the statistical power of the diagnostic test at a particular cutoff value of
the criterion variable. Similarly, FPR is an estimate of the probability that the diagnostic test results in a type I
(alpha) error. Thus the ROC curve may be interpreted as a plot of the diagnostic test’s power versus it’s
significance level at various possible criterion cutoff values.
Users of ROC curves have developed special names for TPR and FPR. They call TPR the sensitivity of the test
and 1 - FPR the specificity of the test. Statisticians will be more familiar with using the word power instead of
sensitivity and the phrase ‘1 - alpha’ instead of specificity.
An ROC curve may be summarized by the area under it (AUC). This area has an additional interpretation.
Suppose that a rater is asked to study two subjects, one that is actually disease positive and one that is disease
negative. The AUC is equal to the probability that the rater will give the disease positive subject a higher score
than the disease negative subject. That is, the AUC is the probability that the rater will correctly order the two
subjects as to which is more likely to have the disease.
Several methods of computing the AUC have been proposed. One method uses the trapezoidal rule to calculate
the AUC directly. Another method, called the binormal model, computes the area by fitting two normal
distributions to the data.
265-1
© NCSS, LLC. All Rights Reserved.
PASS Sample Size Software NCSS.com
Tests for Two ROC Curves
(
X ~ N µ− , σ −2 )
and
(
Y ~ N µ+ , σ +2 )
The partial area under the ROC curve, AUC, is defined as
c2
θi = ∫ Φ ( Ai + Bi v )φ ( v )dv
c1
(
where Φ( z ) is the cumulative normal distribution, c j = Φ −1 FPR j , and )
µi + − µi −
Ai =
σ i+
σ i−
Bi =
σ i+
Note that for the full range area under the curve, c1 = −∞ and c2 = ∞ .
Maximum likelihood estimates of A and B can be computed. The variances and covariance of these MLE’s can be
estimated from Fisher’s information matrix.
Define ∆ = θ1 − θ2 to be the difference in the accuracies (AUC’s) of the two tests. A test of whether the two AUC’s
are different amounts to testing whether ∆ = 0 . The test statistic for this test is
∆ − 0
Z=
var ∆0 ()
()
is the variance of ∆ under the null hypothesis of equality. The above test statistic results in the
where var0 ∆
following formula for computing sample size
() ()
2
z V ∆ + z V ∆
α 0 β Alt
N+ =
∆2
Rating Data
When the criterion values are discrete rating values, Obuchowski and McClish (1997) showed that the variances
could be calculated using
() ( ) ( ) ( )
V0 ∆ = V θ1 + V θ1 − 2C θ1 ,θ1
265-2
© NCSS, LLC. All Rights Reserved.
PASS Sample Size Software NCSS.com
Tests for Two ROC Curves
where
B 2 A2
( ) R 2
1 + R
V θi = f i 2 1 + i + i + gi2 Bi2
2R
Ai2
E1i = exp −
2 + 2 Bi2
E2 i = 1 + Bi2
E3i = Φ( c2 ) − Φ( c1 )
c2 c2
E4 i = exp − 1 − exp − 2
2 2
( )
Aj B j
Φ −1 FPR j +
1 + B 2j
cj =
1 + B 2j
N−
R=
N+
Ai = Bi Φ −1 (TNRi ) − Φ −1 ( FPRi )
r− and r+ are the correlations between the results of the two diagnostics tests for normal and abnormal patients,
respectively. For the most conservative results, set Bi = 1 .
Continuous Data
When the criterion values are continuous, Obuchowski (1998) suggests that the following formulas of Hanley and
McNeil (1983) are more appropriate. Note that these formulas cannot be used for evaluating the AUC for a partial
range.
() ( ) ( ) (
V ∆ = V θ1 + V θ2 − 2C θ1 ,θ2 )
265-3
© NCSS, LLC. All Rights Reserved.
PASS Sample Size Software NCSS.com
Tests for Two ROC Curves
where
θi 2θi2
( )
V θi =
R( 2 − θi )
+
1 + θi
1 + R
− θi2
R
( )
C θ1 ,θ2 = 2r V (θ1 )V (θ2 )
and r is derived from a special table provided by Hanley and McNeil (1983).
Procedure Options
This section describes the options that are specific to this procedure. These are located on the Design tab. For
more information about the options of other tabs, go to the Procedure Window chapter.
Design Tab
The Design tab contains most of the parameters and options that you will be concerned with.
Solve For
Solve For
This option specifies the parameter to be solved for from the other parameters. Under most situations, you will
select either Power or Sample Size (N+).
Select Sample Size (N+) when you want to calculate the sample size needed to achieve a given power and alpha
level.
Select Power when you want to calculate the power of an experiment that has already been run.
Test
Alternative Hypothesis
Specify whether the test is one-sided or two-sided. When a two-sided test is selected, the value of alpha is divided
by two.
Note that most researchers assume that, unless stated otherwise, all statistical tests are two-sided. If you use a one-
sided test, you should clearly state and justify this in all reports.
265-4
© NCSS, LLC. All Rights Reserved.
PASS Sample Size Software NCSS.com
Tests for Two ROC Curves
Alpha
This option specifies one or more values for the probability of a type-I error. A type-I error occurs when a true
null hypothesis is rejected.
Values must be between zero and one. Historically, the value of 0.05 has been used for alpha. This means that
about one test in twenty will falsely reject the null hypothesis. You should pick a value for alpha that represents
the risk of a type-I error you are willing to take in your experimental situation.
You may enter a range of values such as 0.01 0.05 0.10 or 0.01 to 0.10 by 0.01.
265-5
© NCSS, LLC. All Rights Reserved.
PASS Sample Size Software NCSS.com
Tests for Two ROC Curves
265-6
© NCSS, LLC. All Rights Reserved.
PASS Sample Size Software NCSS.com
Tests for Two ROC Curves
265-7
© NCSS, LLC. All Rights Reserved.
PASS Sample Size Software NCSS.com
Tests for Two ROC Curves
265-8
© NCSS, LLC. All Rights Reserved.
PASS Sample Size Software NCSS.com
Tests for Two ROC Curves
Correlation-
This is the correlation between the two diagnostic-test scores for the negative group. Although correlations can
range between -1 and 1, typical values are from 0.3 to 0.6.
Note that if you want to analyze a design in which a separate set of patients receive each diagnostic test, this may
be done by setting this correlation value to 0.
• Continuous
The test results are from a continuum of possible values. The Hanley and McNeil (1983) variance formulas
are used. Note that this option does not allow a partial range of FPR values to be analyzed.
• Discrete (Ratings)
The test results are from a small set of rating values such as 1, 2, 3, 4, 5. The Obuchowski & McClish (1997)
variance formulas are used.
B1 (SD Ratio)
B1 is the ratio of the standard deviation of the negative group to the positive group (SD-/SD+) for diagnostic test
1. That is, assuming the binormal model
σ 1−
B1 =
σ 1+
Note that this parameter is ignored for continuous data.
Although B1 can be any positive number, typical values are between 0.3 and 3.0. Obuchowski suggests that if the
value of B1 is not known, a value of 1.0 is used since this will result in a conservative (extra large) sample size.
She reports that in her experience, typical values are much less than 1.0, often near 0.3.
B2 (SD Ratio)
B2 is the ratio of the standard deviation of the negative group to the positive group (SD-/SP+) for diagnostic test
2. That is, assuming the binormal model
σ 2−
B2 =
σ 2+
Note that this parameter is ignored for continuous data.
Although B2 can be any positive number, typical values are between 0.3 and 3.0. Obuchowski suggests that if the
value of B2 is not known, a value of 1.0 is used since this will result in a conservative (extra large) sample size.
She reports that in her experience, typical values are much less than 1.0, often near 0.3.
265-9
© NCSS, LLC. All Rights Reserved.
PASS Sample Size Software NCSS.com
Tests for Two ROC Curves
Setup
This section presents the values of each of the parameters needed to run this example. First, from the PASS Home
window, load the Tests for Two ROC Curves procedure window by clicking on ROC, and then clicking on
Tests for Two ROC Curves. You may then make the appropriate entries as listed below, or open Example 1 by
going to the File menu and choosing Open Example Template.
Option Value
Design Tab
Solve For ................................................ Power
Alternative Hypothesis ............................ Two-Sided Test
Alpha ....................................................... 0.05
Group Allocation ..................................... Enter N+ and R, where N- = R x N+
N+ (Size of Positive Group) .................... 20 50 100 250 500 1000 2000
R (Sample Allocation Ratio) ................... 2
AUC1 (Area Under Curve 1) ................... 0.80
AUC2 (Area Under Curve 2) ................... 0.825 0.85 0.9
Lower FPR .............................................. 0.00
Upper FPR .............................................. 1.00
Correlation+ ............................................ 0.6
Correlation- ............................................. 0.6
Type of Data ........................................... Discrete (Ratings)
B1 (SD Ratio) .......................................... 1
B2 (SD Ratio) .......................................... 1
Annotated Output
Click the Calculate button to perform the calculations and generate the following output.
Numeric Report
Numeric Results for Testing AUC1 = AUC2 with Discrete (Rating) Data
Test Type = Two-Sided. FPR1 = 0.0. FPR2 = 1.0. B1 = 1.000. B2 = 1.000. Allocation Ratio = 2.000.
Target Actual
Power N+ N- N R R AUC1' AUC2' Diff' AUC1 AUC2 Diff Alpha
0.0501 20 40 60 2.0 2.0 0.8000 0.8250 0.0250 0.8000 0.8250 0.0250 0.050
0.0733 50 100 150 2.0 2.0 0.8000 0.8250 0.0250 0.8000 0.8250 0.0250 0.050
0.1084 100 200 300 2.0 2.0 0.8000 0.8250 0.0250 0.8000 0.8250 0.0250 0.050
0.2104 250 500 750 2.0 2.0 0.8000 0.8250 0.0250 0.8000 0.8250 0.0250 0.050
0.3744 500 1000 1500 2.0 2.0 0.8000 0.8250 0.0250 0.8000 0.8250 0.0250 0.050
0.6426 1000 2000 3000 2.0 2.0 0.8000 0.8250 0.0250 0.8000 0.8250 0.0250 0.050
0.9090 2000 4000 6000 2.0 2.0 0.8000 0.8250 0.0250 0.8000 0.8250 0.0250 0.050
(report continues)
265-10
© NCSS, LLC. All Rights Reserved.
PASS Sample Size Software NCSS.com
Tests for Two ROC Curves
Report Definitions
Power is the probability of rejecting a false null hypothesis.
N+ and N- are the number of items sampled from each population.
N is the total sample size, N+ + N-.
Target R is the desired ratio (or ratios) of R entered in the procedure. R is the ratio of N- to N+, so that
N- = R × N+.
Actual R is the value for R obtained in this scenario. Because N+ and N- are discrete, this value is sometimes
slightly different than the target R.
AUC1' and AUC2' are the adjusted areas under the ROC curve for diagnostic tests 1 and 2, respectively.
Diff' is AUC2 - AUC1. This is the adjusted difference to be detected.
AUC1' and AUC2' are the actual areas under the ROC curve for diagnostic tests 1 and 2, respectively.
Diff is AUC2 - AUC1. This is the difference to be detected.
Alpha is the probability of rejecting a true null hypothesis.
FPR1, FPR2 are the lower and upper bounds on the false positive rates.
B1 and B2 are the ratios of the standard deviations of the negative and positive groups for each test.
Summary Statements
A sample of 20 from the positive group and 40 from the negative group achieve 5% power to
detect a difference of 0.0250 between a diagnostic test with an area under the ROC curve (AUC)
of 0.8000 and another diagnostic test with an AUC of 0.8250 using a two-sided z-test at a
significance level of 0.0500. The data are discrete (rating scale) responses. The AUC is
computed between false positive rates of 0.000 and 1.000. The ratio of the standard deviation
of the responses in the negative group to the standard deviation of the responses in the
positive group for diagnostic test 1 is 1.000 and for diagnostic test 2 is 1.000. The
correlation between the two diagnostic tests is assumed to be 0.600 for the positive group and
0.600 for the negative group.
This report shows the power for each of the sample sizes. Most of the definitions are standard. However, a special
explanation must be given for AUC and AUC’.
AUC’
This is the adjusted area under the curve. A rescaling, discussed earlier, has been applied so that the minimum
area is 0.5 and the maximum area is 1.0.
AUC
This is the actual area under the curve. This value will equal the adjusted area when the FPR range is set from 0.0
to 1.0. Otherwise, these values will be different.
Plots Section
These plots show the power versus the sample size for the three values of AUC1.
265-11
© NCSS, LLC. All Rights Reserved.
PASS Sample Size Software NCSS.com
Tests for Two ROC Curves
Setup
This section presents the values of each of the parameters needed to run this example. First, from the PASS Home
window, load the Tests for Two ROC Curves procedure window by clicking on ROC, and then clicking on
Tests for Two ROC Curves. You may then make the appropriate entries as listed below, or open Example 2 by
going to the File menu and choosing Open Example Template.
Option Value
Design Tab
Solve For ................................................ Sample Size
Alternative Hypothesis ............................ Two-Sided Test
Power ...................................................... 0.90
Alpha ....................................................... 0.05
Group Allocation ..................................... Enter R = N-/N+, solve for N+ and N-
R (Sample Allocation Ratio) ................... 2
AUC1 (Area Under Curve 1) ................... 0.80
AUC2 (Area Under Curve 2) ................... 0.825 0.85 0.9
Lower FPR .............................................. 0.00
Upper FPR .............................................. 1.00
Correlation+ ............................................ 0.6
Correlation- ............................................. 0.6
Type of Data ........................................... Discrete (Ratings)
B1 (SD Ratio) .......................................... 1
B2 (SD Ratio) .......................................... 1
Output
Click the Calculate button to perform the calculations and generate the following output.
Numeric Results
Numeric Results for Testing AUC1 = AUC2 with Discrete (Rating) Data
Test Type = Two-Sided. FPR1 = 0.0. FPR2 = 1.0. B1 = 1.000. B2 = 1.000. Allocation Ratio = 2.000.
This report shows the sample size needed to achieve 90% power for each value of AUC2.
265-12
© NCSS, LLC. All Rights Reserved.
PASS Sample Size Software NCSS.com
Tests for Two ROC Curves
Setup
This section presents the values of each of the parameters needed to run this example. First, from the PASS Home
window, load the Tests for Two ROC Curves procedure window by clicking on ROC, and then clicking on
Tests for Two ROC Curves. You may then make the appropriate entries as listed below, or open Example 3 by
going to the File menu and choosing Open Example Template.
Option Value
Design Tab
Solve For ................................................ Sample Size
Alternative Hypothesis ............................ Two-Sided Test
Power ...................................................... 0.90
Alpha ....................................................... 0.05
Group Allocation ..................................... Enter R = N-/N+, solve for N+ and N-
R (Sample Allocation Ratio) ................... 2
AUC1 (Area Under Curve 1) ................... 0.80
AUC2 (Area Under Curve 2) ................... 0.825 0.85 0.9
Lower FPR .............................................. 0.00
Upper FPR .............................................. 0.20
Correlation+ ............................................ 0.6
Correlation- ............................................. 0.6
Type of Data ........................................... Discrete (Ratings)
B1 (SD Ratio) .......................................... 1
B2 (SD Ratio) .......................................... 1
Output
Click the Calculate button to perform the calculations and generate the following output.
Numeric Results
Numeric Results for Testing AUC1 = AUC2 with Discrete (Rating) Data
Test Type = Two-Sided. FPR1 = 0.0. FPR2 = 0.200. B1 = 1.000. B2 = 1.000. Allocation Ratio = 2.000.
Note that the necessary sample size has more than doubled.
265-13
© NCSS, LLC. All Rights Reserved.
PASS Sample Size Software NCSS.com
Tests for Two ROC Curves
Setup
This section presents the values of each of the parameters needed to run this example. First, from the PASS Home
window, load the Tests for Two ROC Curves procedure window by clicking on ROC, and then clicking on
Tests for Two ROC Curves. You may then make the appropriate entries as listed below, or open Example 4 by
going to the File menu and choosing Open Example Template.
Option Value
Design Tab
Solve For ................................................ Sample Size
Alternative Hypothesis ............................ Two-Sided Test
Power ...................................................... 0.80
Alpha ....................................................... 0.05
Group Allocation ..................................... Enter R = N-/N+, solve for N+ and N-
R (Sample Allocation Ratio) ................... 2
AUC1 (Area Under Curve 1) ................... 0.922222
AUC2 (Area Under Curve 2) ................... 0.819444
Lower FPR .............................................. 0.00
Upper FPR .............................................. 0.20
Correlation+ ............................................ 0.6
Correlation- ............................................. 0.6
Type of Data ........................................... Discrete (Ratings)
B1 (SD Ratio) .......................................... 1
B2 (SD Ratio) .......................................... 1
Output
Click the Calculate button to perform the calculations and generate the following output.
Numeric Results
Numeric Results for Testing AUC1 = AUC2 with Discrete (Rating) Data
Test Type = Two-Sided. FPR1 = 0.0. FPR2 = 0.200. B1 = 1.000. B2 = 1.000. Allocation Ratio = 2.000.
Note that the sample sizes of 109 and 218 match exactly with the results of Obuchowski.
265-14
© NCSS, LLC. All Rights Reserved.
PASS Sample Size Software NCSS.com
Tests for Two ROC Curves
Setup
This section presents the values of each of the parameters needed to run this example. First, from the PASS Home
window, load the Tests for Two ROC Curves procedure window by clicking on ROC, and then clicking on
Tests for Two ROC Curves. You may then make the appropriate entries as listed below, or open Example 5 by
going to the File menu and choosing Open Example Template.
Option Value
Design Tab
Solve For ................................................ Sample Size
Alternative Hypothesis ............................ One-Sided Test
Power ...................................................... 0.8 0.9 0.95
Alpha ....................................................... 0.05
Group Allocation ..................................... Equal (N+ = N-)
AUC1 (Area Under Curve 1) ................... 0.7
AUC2 (Area Under Curve 2) ................... 0.75
Lower FPR .............................................. 0.00
Upper FPR .............................................. 1.00
Correlation+ ............................................ 0.0
Correlation- ............................................. 0.0
Type of Data ........................................... Continuous
Output
Click the Calculate button to perform the calculations and generate the following output.
Numeric Results 1
Numeric Results for Testing AUC1 = AUC2 with Continuous Data
Test Type = One-Sided. FPR1 = 0.0. FPR2 = 1.0. B1 = 1.000. B2 = 1.000. Allocation Ratio = 1.000.
Target Actual
Power Power N+ N- N AUC1' AUC2' Diff' AUC1 AUC2 Diff Alpha
0.80 0.8003 652 652 1304 0.7000 0.7500 0.0500 0.7000 0.7500 0.0500 0.050
0.90 0.9001 897 897 1794 0.7000 0.7500 0.0500 0.7000 0.7500 0.0500 0.050
0.95 0.9501 1129 1129 2258 0.7000 0.7500 0.0500 0.7000 0.7500 0.0500 0.050
Note that the sample sizes of 897 and 652 match exactly with the results of Hanley and McNeil. The 1129 is two
less than their 1131. This difference may be due to refinements in computing the normal probability distribution
used in PASS. You can compare these sample sizes by calculating the power.
265-15
© NCSS, LLC. All Rights Reserved.
PASS Sample Size Software NCSS.com
Tests for Two ROC Curves
Numeric Results 2
Numeric Results for Testing AUC1 = AUC2 with Continuous Data
Test Type = One-Sided. FPR1 = 0.0. FPR2 = 1.0. B1 = 1.000. B2 = 1.000. Allocation Ratio = 1.000.
Note that the power for 1129 is 0.9501 while the power for 1131 is 0.9505. This is only a slight difference and
explains why this value showed up in their table.
265-16
© NCSS, LLC. All Rights Reserved.