0 ratings0% found this document useful (0 votes) 73 views8 pages13-Module 5 - ROC Curve Analysis - Introduction and Motivation-26-09-2023
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here.
Available Formats
Download as PDF or read online on Scribd
MAT1031
Bio-Statistics
Module 5
ROC Curve Analysi
syllabus
a isin cnn an i ii Care wi |
ROC Curve Analysis
IReceiver-Operating Characteristic (ROC) Analysis,
|A ROC curve is a plot of the true positive rate (Sensitivity)
lin function of the false positive rate (100-Specificity) for
|different cut-off points of a parameter. Each point on the
IROC curve represents a sensitivity/specificity pair
|corresponding to a particular decision threshold. The Area
lunder the ROC curve (AUC) is a measure of how well a
parameter can distinguish between two diagnostic groups
(diseased /normal).
[ROC Curve Analysis (Continuation)
[The ROC curve is a graph showing true positive rate on
the vertical axis and false positive rate on the horizontal
laxis, as the classification threshold t varies. It isa single
[curve summarizing the information in the cumulative
distribution functions of the scores of the two classes. One
Jcan think of it as a complete representation of classifier
performance, as the choice of the classification threshold
It varies.
[ROC Curve Analysis (Continuation)
‘The diagnostic performance of a test, or the accuracy of a
test to discriminate diseased cases from normal cases is
evaluated using Receiver Operating Characteristic (ROC)
curve analysis (Metz, 1978; Zweig & Campbell, 1993). ROC
curves can also be used to compare the diagnostic
performance of two or more laboratory or diagnostic
tests (Griner et al., 1981).
When you consider the results of a particular test in two
populations, one population with a disease, the other
population without the disease, you will rarely observe a
perfect separation between the two groups. Indeed, the
distribution of the test results will overlap, as shown in
the figure.
[ROC Curve Analysis (Continuation)
Criterion value
Testresult
[ROC Curve Analysis (Continuation)
For every possible cut-off point or criterion value you
select to discriminate between the two populations, there
will be some cases with the disease correctly classified as
positive (TP = True Positive fraction), but some cases with
the disease will be classified negative (FN = False Negative
fraction). On the other hand, some cases without the
disease will be correctly classified as negative (TN = True
Negative fraction), but some cases without the disease
‘will be classified as positive (FP = False Positive fraction).[Schematic Outcomes ofa Test
Power Tne Poane(i®) » FabePoameG) © ave
Nor Fae eae 6X) § Treen) «bea
[Sensitivity and Specificity
sre hy
t= Faas ~ 7 Toa
[Sensitivity and Specificity
[Suppose that tis the value of the threshold T in a
|particular classification rule, so that an individual is
lallocated to population P if its classification score s
lexceeds t and otherwise to population N. In order to assess
Ithe efficacy of this classifier we need to calculate the
|probability of making an incorrect allocation. Such a
[probability tells us the rate at which future individuals
|requiring classification will be misallocated. More
|specifically, we can define four probabilities and their
lassociated rates for the classifier.
[Sensitivity and Specificity
Supp
1c that is the value of the threshold T'in a particular elasi-
P itits
to population N, In order
to asses the efficacy of this casifer we need to calculate the prob
bility of making an incorrect allocation. Suc probability tells us the]
rte at which future individuals requiring elasifcation will be mis:
located.
fication rule, so that an individual is allocated to popula
[Sensitivity and Specificity
1. the probability that an individual from P is carreetly classified
i.e, the true positive rate tp = pls > #[P);
the probability that an individual from N is misclassified, i.e, the
false postive rate fp = p(s > tIN):
8. the probability that an individual from N is correctly classified,
Le, the true negative rate tn = p(s < tN}: and
4. the probability that an indivi
false negative rate fn = pls < t{P).
[Sensitivity and Specificity
Given probability densities p(s|P). (I)
values ying between O aud 1 ean be obtained readily for these four rate
Jand this gives a full description of the performan
\Cleatly, for good performance, we requite high “true
1d the value 4, numerical
‘of the clasitie
However, this is for « particular choice of threshold f, and the best
Jcnoice of fis not generally known in advance but unst be determined
las part of the classifier construction. Varying f aud evaluating all the
our quantities above wil larly give full information on whieh to base
this decision and hence to assess the performance of the lassifer, but
fsince tp + fn pin ih information,
The ROC eurve provides «um
wwe do not nee s
‘a more easily digestible summary, Tt
Hs the curve obtained on vary
f
ere is the value on the horizontal axis (absciss
fon the vertical axis (ordinate,
‘and Fabs[Sensitivity and Specificity
Let us consider the extremes. ‘The classifier will be least success
ful when the two popilations are exactly the same, so that p{s|P) =
p(sIN) = p(s), say. In such a ease the probability of allocating an indie
vidual to population P is the same whether that individual las come
from P o from N, the exact value ofthis probability depending on the
ehresholdwalue¢, So in this case, as ¢vaties ¢p will always equal fp aud
the ROC euree wil be the straight lin joining points (0,0) and (1,1).
This line is usually called the chance diagonal, as it represents essen
ally random allocation of individuals to one of the two populations
Figure shows three such curves plus the chance diagonal.
solid eurve correspon to the best classifier, beeanse at any fp vale i
has higher tp than all the others while at any tp value it as lower fp
[Sensitivity and Specificity
Figure : Three ROC curves, plus chance diagonal
08
cr
04
0.0
=: a: 88: of ee te
fp
ROC - Area Under the Curve (AUC)
|The ROC curve is plotted with TPR against the FPR where
ITPR is on the y-axis and FPR is on the x-axis. The area
lunder a receiver operating characteristic (ROC) curve,
labbreviated as AUC, is a single scalar value that measures
{the overall performance of a binary classifier (Hanley and
McNeil 1982). AUC - ROC curve is a performance
measurement for the classification problems at various,
{threshold settings. ROC is a probability curve and AUC
lrepresents the degree or measure of separability. It tells,
how much the model is capable of distinguishing between
classes.
[area under ROC Curve - AUC
|AUC measures the entire two-dimensional area
junderneath the entire ROC from (0,0) to (1,1). The AUC.
value is within the range [0.0-1.0], where the minimum
value represents the performance of a random classifier
Jand the maximum value would correspond to a perfect
classifier (eg, with a classification error rate equivalent
to zero). As AUC ranges in value from 0 to 1, a model
]whose predictions are 100% wrong has an AUC of 0.0; one
]whose predictions are 100% correct has an AUC of 1.0,
[area under ROC Curve - AUC
[AUC is de:
ble for the following two reasons:
|e AUC is scale-invariant. It measures how well
predictions are ranked, rather than their absolute
values.
|e AUC is classification-threshold-invariant. It measures
the quality of the model's predictions irrespective of
what classification threshold is chosen.
[Area under ROC Curve - AUC
[The AUC is a robust overall measure to evaluate the
performance of score classifiers because its calculation
relies on the complete ROC curve and thus involves all
lpossible classification thresholds. The AUC is typically
calculated by adding successive trapezoid areas below the
JROC curve.[area under ROC Curve - AUC
[Figure shows the ROC curves for two score classifiers A
land B. In this example, classifier A has a larger AUC value
Ithan classifier B.
8 Properties of the ROC
‘To study some of the
fanuiliar math
parties, lt us dei
ical notation as the eurve y
true positive rate tp and x isthe fale postive
points (2,1) on the curve are determined by the
classification sore S, and y ean be written more precisely as funetions
of the parameter f, viz, (8) = pfs > tN) and y(t) = p(s > AP).
However, we will only use this expanded notation if the presence of
this parameter needs to be emphasized.
Property 1
y = h(x) is @ monotone increasing function in the positive
Iying between y = 0 at =O and y = 1 at x = 1
Proof: Consideration of the way that the classifier scores are arranged
shows that both 2() and y(t) inerease and decrease together a t
vies, Moreover, Litho (2) = lite u(t) = O and lity 2)
Tim-.-2o y(t) = 1, whieh establishes the result.
Property 1
y = h(2) is ® monotone increasing function in the positive quadra
Iying between y = 0 at =O and y = 1 at x = 1
Proof: Consideration of the way that the classifier scores are arranged
shows that both 2) and y(t) increase
varies. Moreover, lity-ao-r() = lim, W(t
Tim-.-2o y(t) = 1, whieh establishes the result.
decrease together as t
O and lim x 2(¢
Property 2
The ROC cu
increasing transfor
vo is unaltered
the classification scores undergo astrietly
Proof: Suppose that U = oS) is strictly increasing transfor
se, St > Sp = Us = Si) > Uz = o(S2). Consider the poi
ROC curve for $ at threshold value
ise
‘on the
sand let v= o(t). Then it follows
WU > oP)
GAS) > DIP) = wlS > t1P)
{U > vIN) = plo(S) > (tN) = p(S > tN)
ao that the same point exists on the ROC curve for UT. Applying the
reverse argument to each point on the ROC curve for U establishes
hat the two curves ae identical
Property 8
Providing that the slope of the ROC at the point with threshold value
1 is welhdefined, itis given by
ly _ p(CiP)
Proof: First note that
ule
so that
Ths
4
&Moreover, ,
x(t) = p(S > tN) PlsIN)ds,
bata de .
4 = pan)
Also
dt dx
E-1/h
|which establishes the result.
Zaid Area under the ROC eave
Probably the most widely used suonmary index is the area under the
ROC curve, commonly denoted AUC and studied by Green and Smets
(1966), Bamber (1975), Hanley and McNeil (1982), and Bradley (1907)
among others. Simple geometry establishes the upper and lower bounds
of AUC: for the case of perfect separation of P and N, AUC is the area
nder the upper borers of the ROC (ie, the area of a square of se
1) s0 the upper bound is 1.0, while for the case of random alloca
AUC isthe area under the chance diagonal (ie. the area of a triangle
‘whose base and height are both equal to 1) s0 the lower bound i 0.5.
For all other cases, the formal definition is
AUC f vledde.
Tn other words, if Sp and Sy are the scores allocated to randomly ancl
independently chosen individuals from P and N respectively, then,
AUC = {Sp > Sx).
‘To prove this result, we start from the definition of AUC given
above and change the variable in the integration from the fp rate # t0
the classifier threshold t. From the proof of property 3 of the ROC as
given above we fist recollct that
de
WO) = AS > tP).2(0) = AS > eN), and Z = CN),
and we also note that 2 — 0 ast 90 and 2 — 1 as t= —20. Hence
[Aue
Toe
J wea (om engin th able nertin)
= ae WAS > e(P)p{@(N)de (Grom the result above)
[2s ammere
[ons
[use > sen
tk Sy
= Sp > Sw) (by total probab
as required.
2.5 The binormal model
‘The normal probability distribution has loug formed a cornerstone of
statistical theory. Tt is used as the population model for very many
situntions where the measurements are quantitative, and hence it
derpins most basic inferential procedures for such measurements. The
reasons for this are partly because empirical evidence suggests that
ts taken iu practice do aetually behave roughly like
normal populations, but also partly because math-
ults such as the central limit theorem show that the nor-
nal distribution provides perfectly adloquate approsimation to the
‘rue probability distribution of many important statistics. The nor-
‘mal model is thus a *standard” against which any other suggestion is
visually measured in common statistical practice.
Tikewise, for ROC analysis, ts useful to have such « standard
model which can be adopted as a frst port of call in the expecta:
tiom that i¢ will provide a reasonable anslysis in many practical situs
tons and against which any specialized analysis ean be judged. Such
‘benchmark is provided by the binormal model iu which we asstimc
the scores Sof the clasifier to have a normal distribution in ench of
the two populations P and N, This model will always be “correct” if
the original measurements X have multivariate normal distributions in
the two populations and the clasifier i a Linear function of the mea
surements ofthe type fist derived by Fisher (00), as is shown in ay
standard multivariate text book (eg., Krzanowski and Marriott, 1995,
pp. 29-30). However, itis also apprarimately correct for a much wider
set of measurement populations and clasifiers. Moreover, as we shall
see Inter in this section, this class is even wider in the specific ease of,
ROC analysis. First, however, lt us explore some of the consequences
of the binormal assutnption,‘To be specific, we assume that the distributions of the soores 5 are
normal in both populations and have means jp, ix. and standard de-
Viations op, ow in P and N respectively. In accord with the convention
that large values of S are indicative of population P and small ones
indicative of population N we further assume that pp
place no constraints on the standard deviations, ‘The
> py, but we
(5 up)/op
has standagd normal distribution in P, andl (S—yxx)/ow bins a stan
dard normal distribution in N. Suppose that the fp rate is x, with
correspond
1 classifier threshold f. Then
2(0) = WS > HIN
AZ > [t~ p/ow)
‘where Z has a standard normal distribution. Thus
WZ < low A/a) (by 9
try of the normal distribution),
50 x(0) = # (4) where #() is the normal cumulative distribution
fu
ction (ef). Ths if = isthe value of Z giving tse to this ef, then
aMe(p) = Ht
ad
tay on Xe
Hence the ROC curve at this fp rate is
ute)
HS > tP) = HZ > (t~ wojor) = # (HE
‘and on substituting forthe value of¢ from above we obtain
(eset)
ule)
‘Thus the ROC curve is of the form y(x) = Ba + be.) or #"(y) =
+ 11(), where
= (p—px)/or, and b=ox/or.
It follows from the earlier assumptions that a > 0, while b is clearly
nonnegative by definition, The former is known asthe intercept of the
binormal ROC curve, ad the Iter as its slope
op 02 04 05 081
Figure 22 shows three ROG curves derived from simple binormal
modes. ‘The top eurve (dotted line) is for the case jy
‘8 mean difleence in classification scores of 4
virtually complete separation of the two normal
‘populations, so the ROC curve is very close to the best porsible on
‘The middle curve (sll Iie) is for py = 0,n4p = 2,0y = op = 1, and
the reduction in separation of means to 2 standardized units is rflected
in the poorer ROC curve. Finally, the bottom curve (dashed line) has
the same values as the middle one except for op = 2; the further
Aeterioration in performance is caused by the imbalance in standard
deviations of the two poptlations, the higher standard deviation in
ion P effectively diluti
the difference in popt
also be mentioned at this point that if one distribution has a
sufficiently large standard deviation relative to that ofthe other, and
to the difference between the population means, then the ROC curve
will dip below the chance diagonal
‘One very weil consequence of this model i that its AUC can
be derived very easily, and has a very simple form. We sw eal
that AUC = p(Sp > Sx) = p(Sp — Sy > 0) for independent
‘But standard theory tells us that if Sp ~ N(up,o}) independe
Sy ~ N(a, 08), then Sp—Sy ~ Nap — woh +
denotes standard normal randloa variable
aucKullback-Leibler Divergence (KLD)
IKLD(Kuliback and Letbler, 1951)
‘To mesure the dfleence bet
‘arable, 4 mote, elle oe inp, the KL
“Srerpener, has been populty n mdningHteratire, "Te concept
‘was originated in pebabllty tory ad information they
The KL divergence, which i cloely telat to roatve entropy, inform
tion divergence, and information Jor dieriminatin, is otrayiannettie met
sre ofthe diference between two probability cstibations p(s) and q(r
Specially, the Kullhak-Leler (KL) divengener of gz) fom p(s), denoted
2) a merse of the information lt when qe) te
fn discrete random
2) > Daal ge) >0
1 g(2) are two probability dsteution
ssn. That im oth (2) alg) stan up to 1
foe ain X Dye (p(e}a)) i dete a Equation
Paste) = Zo rehnMe
IKLD (Continuation)
The KL divergence menses the expected mane of extra bits required
coe sales rom p(s) when wing acne Dsl on), athe thn ting
ene ned om ps). Typically pe) repre he te” tition of dat
[tweratiom, or precy calculated theoretical distribution, ‘The acetre
ats) epealy represents a theory, mode, description, or aprcximation of a).
"The continuous versio of the KE divergence
Dake)
Although the KL divergence measures the “distance” Ietween two dst
bation, is uot distance mere, This is becatne that the KL divergence
ist mete tansure, Te bs not symmtricr the KL from plc) 0 els) 8
severally wot the sume asthe KL fom q(x) to fx). Furthermore, 1 a
fy triangular inequality: Nevertheless, Dier(PI}Q) i 4 non
Dye(PIIQ) 20 ane Dye PI1Q) =0 if al oly i P= Q.
Notice that attention should be pad when computing the KL divergence. Wi
nn Fino Plagp =0. Homer, when p #0 tnt q = 0, Dx (pla) i let
as a This met tat if ome event
predicts its awolutely imperil (Le, gle) = 0), then the two distributions ar
note diferent,
KLD - Example
Dietibston Distibtion a
Binomial wipe 04, N=?
oan | Bk
Dwnmammate)| 4 E
IKLD - Example (Continuation)
Relative entropies Da. (P || @) and Dic.(Q | P) are calculated asfllows.
Da(P 1) = FP in Z2)
(228) 12, (22708) | 4 (4728
a(Fa)* "(aa )*s"(4s)
= 3 (62In(2) + 551n(8) ~s01n(6)) m 0.0852000
Prat Ql P)= Sate 0 22 )
-10(@8) 405) 40(B)
i
=F AAln() — 6tn(9) + 61n(6)) = 0.087455
KLD Expres:
For conic inter varie me Be A Nao) fr eer a Na) fe
im th pando dente he ops ff, a
‘ett enna, opie) Then arlene at
‘he KLDs me a f)= A) for the spams bi-Nomal ROC curve i partic,
My fda dltoh d=!
ns for Bi-Normal ROC Model
IKLD for Bi-Normal ROC Model (Continuation)
‘The KLDs ae now
sand we can write these os
No fy= edsKLD for Bi-Normal ROC Model (Continuation)
Figure, alysis of @biNormal ROC curve. The gph shows the Kulfhok-Leibler
ivergencs 15) (he oli ine) aod 1) (he dashed ite) fo two Normal desis:
i) for cases hs = 3-4 ad said over ange tat ices y= and efor
evils has = 20 and oy = 1 Wham 1 6) ~ A) a the corresponding
ROC carve symm about he neti digs When 236 = 1-16) sf 298
the conesponting ROC curve is TPP: when 23 IVs)” Mf ad be
csrespnding ROC cuve is INP
%