Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Commit 70ac14b

Browse filesBrowse files
lucyleeowglemaitre
authored andcommitted
DOC Improve plot_precision_recall (#28967)
1 parent 511fb4d commit 70ac14b
Copy full SHA for 70ac14b

File tree

1 file changed

+28
-28
lines changed
Filter options

1 file changed

+28
-28
lines changed

‎examples/model_selection/plot_precision_recall.py

Copy file name to clipboardExpand all lines: examples/model_selection/plot_precision_recall.py
+28-28Lines changed: 28 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -7,55 +7,55 @@
77
88
Precision-Recall is a useful measure of success of prediction when the
99
classes are very imbalanced. In information retrieval, precision is a
10-
measure of result relevancy, while recall is a measure of how many truly
11-
relevant results are returned.
12-
13-
The precision-recall curve shows the tradeoff between precision and
14-
recall for different threshold. A high area under the curve represents
15-
both high recall and high precision, where high precision relates to a
16-
low false positive rate, and high recall relates to a low false negative
17-
rate. High scores for both show that the classifier is returning accurate
18-
results (high precision), as well as returning a majority of all positive
19-
results (high recall).
20-
21-
A system with high recall but low precision returns many results, but most of
22-
its predicted labels are incorrect when compared to the training labels. A
23-
system with high precision but low recall is just the opposite, returning very
24-
few results, but most of its predicted labels are correct when compared to the
25-
training labels. An ideal system with high precision and high recall will
26-
return many results, with all results labeled correctly.
10+
measure of the fraction of relevant items among actually returned items while recall
11+
is a measure of the fraction of items that were returned among all items that should
12+
have been returned. 'Relevancy' here refers to items that are
13+
postively labeled, i.e., true positives and false negatives.
2714
2815
Precision (:math:`P`) is defined as the number of true positives (:math:`T_p`)
2916
over the number of true positives plus the number of false positives
3017
(:math:`F_p`).
3118
32-
:math:`P = \\frac{T_p}{T_p+F_p}`
19+
.. math::
20+
P = \\frac{T_p}{T_p+F_p}
3321
3422
Recall (:math:`R`) is defined as the number of true positives (:math:`T_p`)
3523
over the number of true positives plus the number of false negatives
3624
(:math:`F_n`).
3725
38-
:math:`R = \\frac{T_p}{T_p + F_n}`
26+
.. math::
27+
R = \\frac{T_p}{T_p + F_n}
3928
40-
These quantities are also related to the :math:`F_1` score, which is the
41-
harmonic mean of precision and recall. Thus, we can compute the :math:`F_1`
42-
using the following formula:
29+
The precision-recall curve shows the tradeoff between precision and
30+
recall for different thresholds. A high area under the curve represents
31+
both high recall and high precision. High precision is achieved by having
32+
few false positives in the returned results, and high recall is achieved by
33+
having few false negatives in the relevant results.
34+
High scores for both show that the classifier is returning
35+
accurate results (high precision), as well as returning a majority of all relevant
36+
results (high recall).
4337
44-
:math:`F_1 = \\frac{2T_p}{2T_p + F_p + F_n}`
38+
A system with high recall but low precision returns most of the relevant items, but
39+
the proportion of returned results that are incorrectly labeled is high. A
40+
system with high precision but low recall is just the opposite, returning very
41+
few of the relevant items, but most of its predicted labels are correct when compared
42+
to the actual labels. An ideal system with high precision and high recall will
43+
return most of the relevant items, with most results labeled correctly.
4544
46-
Note that the precision may not decrease with recall. The
47-
definition of precision (:math:`\\frac{T_p}{T_p + F_p}`) shows that lowering
45+
The definition of precision (:math:`\\frac{T_p}{T_p + F_p}`) shows that lowering
4846
the threshold of a classifier may increase the denominator, by increasing the
4947
number of results returned. If the threshold was previously set too high, the
5048
new results may all be true positives, which will increase precision. If the
5149
previous threshold was about right or too low, further lowering the threshold
5250
will introduce false positives, decreasing precision.
5351
5452
Recall is defined as :math:`\\frac{T_p}{T_p+F_n}`, where :math:`T_p+F_n` does
55-
not depend on the classifier threshold. This means that lowering the classifier
53+
not depend on the classifier threshold. Changing the classifier threshold can only
54+
change the numerator, :math:`T_p`. Lowering the classifier
5655
threshold may increase recall, by increasing the number of true positive
5756
results. It is also possible that lowering the threshold may leave recall
58-
unchanged, while the precision fluctuates.
57+
unchanged, while the precision fluctuates. Thus, precision does not necessarily
58+
decrease with recall.
5959
6060
The relationship between recall and precision can be observed in the
6161
stairstep area of the plot - at the edges of these steps a small change
@@ -82,7 +82,7 @@
8282
average precision to multi-class or multi-label classification, it is necessary
8383
to binarize the output. One curve can be drawn per label, but one can also draw
8484
a precision-recall curve by considering each element of the label indicator
85-
matrix as a binary prediction (micro-averaging).
85+
matrix as a binary prediction (:ref:`micro-averaging <average>`).
8686
8787
.. note::
8888

0 commit comments

Comments
0 (0)
Morty Proxy This is a proxified and sanitized view of the page, visit original site.