|
7 | 7 |
|
8 | 8 | Precision-Recall is a useful measure of success of prediction when the
|
9 | 9 | classes are very imbalanced. In information retrieval, precision is a
|
10 |
| -measure of result relevancy, while recall is a measure of how many truly |
11 |
| -relevant results are returned. |
12 |
| -
|
13 |
| -The precision-recall curve shows the tradeoff between precision and |
14 |
| -recall for different threshold. A high area under the curve represents |
15 |
| -both high recall and high precision, where high precision relates to a |
16 |
| -low false positive rate, and high recall relates to a low false negative |
17 |
| -rate. High scores for both show that the classifier is returning accurate |
18 |
| -results (high precision), as well as returning a majority of all positive |
19 |
| -results (high recall). |
20 |
| -
|
21 |
| -A system with high recall but low precision returns many results, but most of |
22 |
| -its predicted labels are incorrect when compared to the training labels. A |
23 |
| -system with high precision but low recall is just the opposite, returning very |
24 |
| -few results, but most of its predicted labels are correct when compared to the |
25 |
| -training labels. An ideal system with high precision and high recall will |
26 |
| -return many results, with all results labeled correctly. |
| 10 | +measure of the fraction of relevant items among actually returned items while recall |
| 11 | +is a measure of the fraction of items that were returned among all items that should |
| 12 | +have been returned. 'Relevancy' here refers to items that are |
| 13 | +postively labeled, i.e., true positives and false negatives. |
27 | 14 |
|
28 | 15 | Precision (:math:`P`) is defined as the number of true positives (:math:`T_p`)
|
29 | 16 | over the number of true positives plus the number of false positives
|
30 | 17 | (:math:`F_p`).
|
31 | 18 |
|
32 |
| -:math:`P = \\frac{T_p}{T_p+F_p}` |
| 19 | +.. math:: |
| 20 | + P = \\frac{T_p}{T_p+F_p} |
33 | 21 |
|
34 | 22 | Recall (:math:`R`) is defined as the number of true positives (:math:`T_p`)
|
35 | 23 | over the number of true positives plus the number of false negatives
|
36 | 24 | (:math:`F_n`).
|
37 | 25 |
|
38 |
| -:math:`R = \\frac{T_p}{T_p + F_n}` |
| 26 | +.. math:: |
| 27 | + R = \\frac{T_p}{T_p + F_n} |
39 | 28 |
|
40 |
| -These quantities are also related to the :math:`F_1` score, which is the |
41 |
| -harmonic mean of precision and recall. Thus, we can compute the :math:`F_1` |
42 |
| -using the following formula: |
| 29 | +The precision-recall curve shows the tradeoff between precision and |
| 30 | +recall for different thresholds. A high area under the curve represents |
| 31 | +both high recall and high precision. High precision is achieved by having |
| 32 | +few false positives in the returned results, and high recall is achieved by |
| 33 | +having few false negatives in the relevant results. |
| 34 | +High scores for both show that the classifier is returning |
| 35 | +accurate results (high precision), as well as returning a majority of all relevant |
| 36 | +results (high recall). |
43 | 37 |
|
44 |
| -:math:`F_1 = \\frac{2T_p}{2T_p + F_p + F_n}` |
| 38 | +A system with high recall but low precision returns most of the relevant items, but |
| 39 | +the proportion of returned results that are incorrectly labeled is high. A |
| 40 | +system with high precision but low recall is just the opposite, returning very |
| 41 | +few of the relevant items, but most of its predicted labels are correct when compared |
| 42 | +to the actual labels. An ideal system with high precision and high recall will |
| 43 | +return most of the relevant items, with most results labeled correctly. |
45 | 44 |
|
46 |
| -Note that the precision may not decrease with recall. The |
47 |
| -definition of precision (:math:`\\frac{T_p}{T_p + F_p}`) shows that lowering |
| 45 | +The definition of precision (:math:`\\frac{T_p}{T_p + F_p}`) shows that lowering |
48 | 46 | the threshold of a classifier may increase the denominator, by increasing the
|
49 | 47 | number of results returned. If the threshold was previously set too high, the
|
50 | 48 | new results may all be true positives, which will increase precision. If the
|
51 | 49 | previous threshold was about right or too low, further lowering the threshold
|
52 | 50 | will introduce false positives, decreasing precision.
|
53 | 51 |
|
54 | 52 | Recall is defined as :math:`\\frac{T_p}{T_p+F_n}`, where :math:`T_p+F_n` does
|
55 |
| -not depend on the classifier threshold. This means that lowering the classifier |
| 53 | +not depend on the classifier threshold. Changing the classifier threshold can only |
| 54 | +change the numerator, :math:`T_p`. Lowering the classifier |
56 | 55 | threshold may increase recall, by increasing the number of true positive
|
57 | 56 | results. It is also possible that lowering the threshold may leave recall
|
58 |
| -unchanged, while the precision fluctuates. |
| 57 | +unchanged, while the precision fluctuates. Thus, precision does not necessarily |
| 58 | +decrease with recall. |
59 | 59 |
|
60 | 60 | The relationship between recall and precision can be observed in the
|
61 | 61 | stairstep area of the plot - at the edges of these steps a small change
|
|
82 | 82 | average precision to multi-class or multi-label classification, it is necessary
|
83 | 83 | to binarize the output. One curve can be drawn per label, but one can also draw
|
84 | 84 | a precision-recall curve by considering each element of the label indicator
|
85 |
| -matrix as a binary prediction (micro-averaging). |
| 85 | +matrix as a binary prediction (:ref:`micro-averaging <average>`). |
86 | 86 |
|
87 | 87 | .. note::
|
88 | 88 |
|
|
0 commit comments