83_ROCCurves
83_ROCCurves
1) Introduction
- Sensitivity = True Positive rate (proportion of patients with the disease Correctly
diagnosed).
- Specificity = True Negative rate (proportion of patients without the disease correctly
diagnosed).
- 1-Specificity = False Positive rate (proportion of patients without the disease who are
incorrectly diagnosed as having the disease).
In a ROC (Receiver Operating Characteristic) curve, the True Positive rate (TP = Sensitivity) is
plotted as a function of the False Positive rate (FP = 1 - Specificity).
When the variable under study cannot distinguish between the two groups, i.e. where there
is no difference between the two distributions, the area will be equal to 0.5 (the ROC curve
will coincide with the diagonal, see Blue line on Figure 1). When there is a perfect separation
of the values of the two groups, i.e. no overlapping of the distributions, the area under the
ROC curve equals 1 (the ROC curve will reach the upper left corner of the plot, see Yellow
line on Figure 1).
Figure 1. Example of ROC curves.
3) Assumptions
In order to perform ROC curve analysis, you should meet the following three assumptions:
The outcome variable ("outcome") is dichotomous and represents whether the participants
did or did not have any major depressive episode within 6 months after being administered
the instrument that generated their depression score "DepressionScore" (measurement of
interest). We believe that a high score of "DepressionScore" tends to be associated with a
Positive outcome. The state variable is going to be the outcome and we set 0 as being the
positive outcome (i.e. major depressive episode within the six months after measurement)
and 1 as being the negative outcome.
5) Procedure on SPSS
Click Analyze > ROC Curve... on the main menu (as shown below).
In the ROC Curve dialog box (see Figure 4), put the measurement of interest
(DepressionScore) in “Test Variable” and the state variable “outcome” in “State Variable”.
You should enter in the Value of State Variable the value for which the event occurs (i.e. the
positive outcome: 0). Then it is recommended to tick all the options below, in particular
“ROC Curve” and “With diagonal reference line”. In the Options box, make sure that the
confidence level in Parameters for Standard Error of Area is 95%.
Figure 4. Options to select for the ROC curves analysis.
6) Results
On Figure 5, we can see that the blue line represents the true positive rate against the false
positive rate and the bigger the area under the blue curve, the better the diagnosis is.
The Significance level or p-value is the probability that the observed sample Area under the
ROC curve is found when in fact, the true (population) Area under the ROC curve is 0.5 (null
hypothesis: Area = 0.5). If p is small (p<0.05) then it can be concluded that the Area under
the ROC curve is significantly different from 0.5 and therefore there is evidence that the
laboratory test does have an ability to distinguish between the two groups. The p-value
shown on the table on Figure 6 is equal to 0.001 and thus suggests very strong evidence
against the null hypothesis. We will then conclude that the depression score can
significantly predict the outcome variable.
The 95% Confidence Interval is the interval in which the true (population) Area under the
ROC curve lies with 95% confidence.
Area Under the Curve
Test Result Variable(s): DepressionScore
The test result variable(s): DepressionScore has at least one tie between the
positive actual state group and the negative actual state group. Statistics may be
biased.
a. Under the nonparametric assumption
b. Null hypothesis: true area = 0.5
A cut-off score is not necessarily easy to determine. On Figure 7, if we take 46.5000 as our
cut off value, we can see that the sensitivity is 84.6% and that the False Positive rate is
equivalent to 31.6% (i.e. a Specificity of 68.4%). It means that if we were to choose 46.5000
as our cut-off point, 84.6% of the positive outcome would be correctly predicted by the
depression score if the depression score were above or equal to 46.5000. On the other hand,
31.6% of the positive outcome would be incorrectly predicted. Choosing the cut-off score
will depend on what percentage of positive outcome are correctly predicted and on what
proportion of positive outcome are incorrectly predicted.
The choice of the cut-off value is quite subjective, depending on what rate of errors you
would tolerate. We advise you to look at the ROC curve and to pick up the point on the
curve which is the furthest away from the line.
The furthest away point corresponds to the coordinates (0.316,0.846), leading to the cutoff
value 46.5000 from the table below:
Positive if
Greater Than or
a
Equal To Sensitivity 1 - Specificity
The table on next page is built from the prediction (Positive/Negative) based on the choice
of 46.5000 as cut-off value. The accuracy is (11+2)/(11+2+6+13) = 0.75.