Skip to content

Commit d727cef

Browse files
committed
Pushing the docs to dev/ for branch: main, commit 0ad90d51537328b7310741d010e569ca6cd33f78
1 parent 0d92612 commit d727cef

File tree

1,267 files changed

+6250
-6200
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

1,267 files changed

+6250
-6200
lines changed
Binary file not shown.
Binary file not shown.

dev/_downloads/764d061a261a2e06ad21ec9133361b2d/plot_precision_recall.ipynb

+1-1
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
"cell_type": "markdown",
55
"metadata": {},
66
"source": [
7-
"\n# Precision-Recall\n\nExample of Precision-Recall metric to evaluate classifier output quality.\n\nPrecision-Recall is a useful measure of success of prediction when the\nclasses are very imbalanced. In information retrieval, precision is a\nmeasure of result relevancy, while recall is a measure of how many truly\nrelevant results are returned.\n\nThe precision-recall curve shows the tradeoff between precision and\nrecall for different threshold. A high area under the curve represents\nboth high recall and high precision, where high precision relates to a\nlow false positive rate, and high recall relates to a low false negative\nrate. High scores for both show that the classifier is returning accurate\nresults (high precision), as well as returning a majority of all positive\nresults (high recall).\n\nA system with high recall but low precision returns many results, but most of\nits predicted labels are incorrect when compared to the training labels. A\nsystem with high precision but low recall is just the opposite, returning very\nfew results, but most of its predicted labels are correct when compared to the\ntraining labels. An ideal system with high precision and high recall will\nreturn many results, with all results labeled correctly.\n\nPrecision ($P$) is defined as the number of true positives ($T_p$)\nover the number of true positives plus the number of false positives\n($F_p$).\n\n$P = \\frac{T_p}{T_p+F_p}$\n\nRecall ($R$) is defined as the number of true positives ($T_p$)\nover the number of true positives plus the number of false negatives\n($F_n$).\n\n$R = \\frac{T_p}{T_p + F_n}$\n\nThese quantities are also related to the $F_1$ score, which is the\nharmonic mean of precision and recall. Thus, we can compute the $F_1$\nusing the following formula:\n\n$F_1 = \\frac{2T_p}{2T_p + F_p + F_n}$\n\nNote that the precision may not decrease with recall. The\ndefinition of precision ($\\frac{T_p}{T_p + F_p}$) shows that lowering\nthe threshold of a classifier may increase the denominator, by increasing the\nnumber of results returned. If the threshold was previously set too high, the\nnew results may all be true positives, which will increase precision. If the\nprevious threshold was about right or too low, further lowering the threshold\nwill introduce false positives, decreasing precision.\n\nRecall is defined as $\\frac{T_p}{T_p+F_n}$, where $T_p+F_n$ does\nnot depend on the classifier threshold. This means that lowering the classifier\nthreshold may increase recall, by increasing the number of true positive\nresults. It is also possible that lowering the threshold may leave recall\nunchanged, while the precision fluctuates.\n\nThe relationship between recall and precision can be observed in the\nstairstep area of the plot - at the edges of these steps a small change\nin the threshold considerably reduces precision, with only a minor gain in\nrecall.\n\n**Average precision** (AP) summarizes such a plot as the weighted mean of\nprecisions achieved at each threshold, with the increase in recall from the\nprevious threshold used as the weight:\n\n$\\text{AP} = \\sum_n (R_n - R_{n-1}) P_n$\n\nwhere $P_n$ and $R_n$ are the precision and recall at the\nnth threshold. A pair $(R_k, P_k)$ is referred to as an\n*operating point*.\n\nAP and the trapezoidal area under the operating points\n(:func:`sklearn.metrics.auc`) are common ways to summarize a precision-recall\ncurve that lead to different results. Read more in the\n`User Guide <precision_recall_f_measure_metrics>`.\n\nPrecision-recall curves are typically used in binary classification to study\nthe output of a classifier. In order to extend the precision-recall curve and\naverage precision to multi-class or multi-label classification, it is necessary\nto binarize the output. One curve can be drawn per label, but one can also draw\na precision-recall curve by considering each element of the label indicator\nmatrix as a binary prediction (micro-averaging).\n\n<div class=\"alert alert-info\"><h4>Note</h4><p>See also :func:`sklearn.metrics.average_precision_score`,\n :func:`sklearn.metrics.recall_score`,\n :func:`sklearn.metrics.precision_score`,\n :func:`sklearn.metrics.f1_score`</p></div>\n"
7+
"\n# Precision-Recall\n\nExample of Precision-Recall metric to evaluate classifier output quality.\n\nPrecision-Recall is a useful measure of success of prediction when the\nclasses are very imbalanced. In information retrieval, precision is a\nmeasure of the fraction of relevant items among actually returned items while recall\nis a measure of the fraction of items that were returned among all items that should\nhave been returned. 'Relevancy' here refers to items that are\npostively labeled, i.e., true positives and false negatives.\n\nPrecision ($P$) is defined as the number of true positives ($T_p$)\nover the number of true positives plus the number of false positives\n($F_p$).\n\n\\begin{align}P = \\frac{T_p}{T_p+F_p}\\end{align}\n\nRecall ($R$) is defined as the number of true positives ($T_p$)\nover the number of true positives plus the number of false negatives\n($F_n$).\n\n\\begin{align}R = \\frac{T_p}{T_p + F_n}\\end{align}\n\nThe precision-recall curve shows the tradeoff between precision and\nrecall for different thresholds. A high area under the curve represents\nboth high recall and high precision. High precision is achieved by having\nfew false positives in the returned results, and high recall is achieved by\nhaving few false negatives in the relevant results.\nHigh scores for both show that the classifier is returning\naccurate results (high precision), as well as returning a majority of all relevant\nresults (high recall).\n\nA system with high recall but low precision returns most of the relevant items, but\nthe proportion of returned results that are incorrectly labeled is high. A\nsystem with high precision but low recall is just the opposite, returning very\nfew of the relevant items, but most of its predicted labels are correct when compared\nto the actual labels. An ideal system with high precision and high recall will\nreturn most of the relevant items, with most results labeled correctly.\n\nThe definition of precision ($\\frac{T_p}{T_p + F_p}$) shows that lowering\nthe threshold of a classifier may increase the denominator, by increasing the\nnumber of results returned. If the threshold was previously set too high, the\nnew results may all be true positives, which will increase precision. If the\nprevious threshold was about right or too low, further lowering the threshold\nwill introduce false positives, decreasing precision.\n\nRecall is defined as $\\frac{T_p}{T_p+F_n}$, where $T_p+F_n$ does\nnot depend on the classifier threshold. Changing the classifier threshold can only\nchange the numerator, $T_p$. Lowering the classifier\nthreshold may increase recall, by increasing the number of true positive\nresults. It is also possible that lowering the threshold may leave recall\nunchanged, while the precision fluctuates. Thus, precision does not necessarily\ndecrease with recall.\n\nThe relationship between recall and precision can be observed in the\nstairstep area of the plot - at the edges of these steps a small change\nin the threshold considerably reduces precision, with only a minor gain in\nrecall.\n\n**Average precision** (AP) summarizes such a plot as the weighted mean of\nprecisions achieved at each threshold, with the increase in recall from the\nprevious threshold used as the weight:\n\n$\\text{AP} = \\sum_n (R_n - R_{n-1}) P_n$\n\nwhere $P_n$ and $R_n$ are the precision and recall at the\nnth threshold. A pair $(R_k, P_k)$ is referred to as an\n*operating point*.\n\nAP and the trapezoidal area under the operating points\n(:func:`sklearn.metrics.auc`) are common ways to summarize a precision-recall\ncurve that lead to different results. Read more in the\n`User Guide <precision_recall_f_measure_metrics>`.\n\nPrecision-recall curves are typically used in binary classification to study\nthe output of a classifier. In order to extend the precision-recall curve and\naverage precision to multi-class or multi-label classification, it is necessary\nto binarize the output. One curve can be drawn per label, but one can also draw\na precision-recall curve by considering each element of the label indicator\nmatrix as a binary prediction (`micro-averaging <average>`).\n\n<div class=\"alert alert-info\"><h4>Note</h4><p>See also :func:`sklearn.metrics.average_precision_score`,\n :func:`sklearn.metrics.recall_score`,\n :func:`sklearn.metrics.precision_score`,\n :func:`sklearn.metrics.f1_score`</p></div>\n"
88
]
99
},
1010
{

dev/_downloads/98161c8b335acb98de356229c1005819/plot_precision_recall.py

+28-28
Original file line numberDiff line numberDiff line change
@@ -7,55 +7,55 @@
77
88
Precision-Recall is a useful measure of success of prediction when the
99
classes are very imbalanced. In information retrieval, precision is a
10-
measure of result relevancy, while recall is a measure of how many truly
11-
relevant results are returned.
12-
13-
The precision-recall curve shows the tradeoff between precision and
14-
recall for different threshold. A high area under the curve represents
15-
both high recall and high precision, where high precision relates to a
16-
low false positive rate, and high recall relates to a low false negative
17-
rate. High scores for both show that the classifier is returning accurate
18-
results (high precision), as well as returning a majority of all positive
19-
results (high recall).
20-
21-
A system with high recall but low precision returns many results, but most of
22-
its predicted labels are incorrect when compared to the training labels. A
23-
system with high precision but low recall is just the opposite, returning very
24-
few results, but most of its predicted labels are correct when compared to the
25-
training labels. An ideal system with high precision and high recall will
26-
return many results, with all results labeled correctly.
10+
measure of the fraction of relevant items among actually returned items while recall
11+
is a measure of the fraction of items that were returned among all items that should
12+
have been returned. 'Relevancy' here refers to items that are
13+
postively labeled, i.e., true positives and false negatives.
2714
2815
Precision (:math:`P`) is defined as the number of true positives (:math:`T_p`)
2916
over the number of true positives plus the number of false positives
3017
(:math:`F_p`).
3118
32-
:math:`P = \\frac{T_p}{T_p+F_p}`
19+
.. math::
20+
P = \\frac{T_p}{T_p+F_p}
3321
3422
Recall (:math:`R`) is defined as the number of true positives (:math:`T_p`)
3523
over the number of true positives plus the number of false negatives
3624
(:math:`F_n`).
3725
38-
:math:`R = \\frac{T_p}{T_p + F_n}`
26+
.. math::
27+
R = \\frac{T_p}{T_p + F_n}
3928
40-
These quantities are also related to the :math:`F_1` score, which is the
41-
harmonic mean of precision and recall. Thus, we can compute the :math:`F_1`
42-
using the following formula:
29+
The precision-recall curve shows the tradeoff between precision and
30+
recall for different thresholds. A high area under the curve represents
31+
both high recall and high precision. High precision is achieved by having
32+
few false positives in the returned results, and high recall is achieved by
33+
having few false negatives in the relevant results.
34+
High scores for both show that the classifier is returning
35+
accurate results (high precision), as well as returning a majority of all relevant
36+
results (high recall).
4337
44-
:math:`F_1 = \\frac{2T_p}{2T_p + F_p + F_n}`
38+
A system with high recall but low precision returns most of the relevant items, but
39+
the proportion of returned results that are incorrectly labeled is high. A
40+
system with high precision but low recall is just the opposite, returning very
41+
few of the relevant items, but most of its predicted labels are correct when compared
42+
to the actual labels. An ideal system with high precision and high recall will
43+
return most of the relevant items, with most results labeled correctly.
4544
46-
Note that the precision may not decrease with recall. The
47-
definition of precision (:math:`\\frac{T_p}{T_p + F_p}`) shows that lowering
45+
The definition of precision (:math:`\\frac{T_p}{T_p + F_p}`) shows that lowering
4846
the threshold of a classifier may increase the denominator, by increasing the
4947
number of results returned. If the threshold was previously set too high, the
5048
new results may all be true positives, which will increase precision. If the
5149
previous threshold was about right or too low, further lowering the threshold
5250
will introduce false positives, decreasing precision.
5351
5452
Recall is defined as :math:`\\frac{T_p}{T_p+F_n}`, where :math:`T_p+F_n` does
55-
not depend on the classifier threshold. This means that lowering the classifier
53+
not depend on the classifier threshold. Changing the classifier threshold can only
54+
change the numerator, :math:`T_p`. Lowering the classifier
5655
threshold may increase recall, by increasing the number of true positive
5756
results. It is also possible that lowering the threshold may leave recall
58-
unchanged, while the precision fluctuates.
57+
unchanged, while the precision fluctuates. Thus, precision does not necessarily
58+
decrease with recall.
5959
6060
The relationship between recall and precision can be observed in the
6161
stairstep area of the plot - at the edges of these steps a small change
@@ -82,7 +82,7 @@
8282
average precision to multi-class or multi-label classification, it is necessary
8383
to binarize the output. One curve can be drawn per label, but one can also draw
8484
a precision-recall curve by considering each element of the label indicator
85-
matrix as a binary prediction (micro-averaging).
85+
matrix as a binary prediction (:ref:`micro-averaging <average>`).
8686
8787
.. note::
8888

dev/_downloads/scikit-learn-docs.zip

-4.94 KB
Binary file not shown.
171 Bytes
145 Bytes
68 Bytes
-6 Bytes
-144 Bytes
110 Bytes
-1.16 KB
-4.31 KB
-75 Bytes
-88 Bytes
-105 Bytes
-50 Bytes
27 Bytes
12 Bytes
158 Bytes
175 Bytes
92 Bytes
-28 Bytes
-201 Bytes
-84 Bytes
-329 Bytes
-35 Bytes
107 Bytes
35 Bytes
-2 Bytes
-24 Bytes
-91 Bytes
-2 Bytes

dev/_sources/auto_examples/applications/plot_cyclical_feature_engineering.rst.txt

+1-1

dev/_sources/auto_examples/applications/plot_digits_denoising.rst.txt

+1-1

dev/_sources/auto_examples/applications/plot_face_recognition.rst.txt

+5-5

dev/_sources/auto_examples/applications/plot_model_complexity_influence.rst.txt

+15-15

0 commit comments

Comments
 (0)