-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
[MRG] Adding pos_label parameter to roc_auc_score #9567
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
sklearn/metrics/ranking.py
Outdated
sample_weight=sample_weight) | ||
return auc(fpr, tpr, reorder=True) | ||
|
||
_partial_binary_roc_auc_score = partial(_binary_roc_auc_score, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we're using _average_binary_score
here to do multi-label, right? In that case we don't need pos_label
(it doesn't make sense). I would check type_of_target(y) here and if we are binary use pos_label
and otherwise call _average_binary_score
.
@amueller Thanks. I removed pos_label support for multilabel-indicator and related test. Is this what you want? |
ping @amueller (and maybe @jnothman, @agramfort in the original pull request) |
sklearn/metrics/ranking.py
Outdated
_partial_binary_roc_auc_score, y_true, y_score, average, | ||
sample_weight=sample_weight) | ||
else: | ||
return _average_binary_score( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What should we do with pos_label
in this case?? Use it? Raise an error if it is set?
@@ -257,6 +257,7 @@ | ||
|
||
"macro_f0.5_score", "macro_f1_score", "macro_f2_score", | ||
"macro_precision_score", "macro_recall_score", | ||
"roc_auc_score", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this affects any common tests. There are two uses of METRICS_WITH_POS_LABEL
. One does not apply, I think; the other I suspect should apply but is being explicitly limited to a handful of metrics in a way that is very ugly and IMO inappropriate...
(Although I'm also confused why roc_auc_score
is in METRICS_UNDEFINED_BINARY
rather than METRICS_UNDEFINED_MULTICLASS
.)
I'd appreciate if you could do a bit of an audit of the common tests wrt roc_auc_score
and related metrics.
roc_auc_score_2 = roc_auc_score(y_true_1, y_pred, pos_label=1) | ||
assert_almost_equal(roc_auc_score_1, roc_auc_score_2) | ||
roc_auc_score_3 = roc_auc_score(y_true_1, y_pred, pos_label=0) | ||
assert_almost_equal(roc_auc_score_1, 1-roc_auc_score_3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spaces around -
, please.
PEP8 only recently allowed not having spaces around binary operators, but spaces should only be removed when the grouping is visually helpful, in contrast to other operators with spaces (e.g. a*b + c
). It is not, here.
@jnothman Thanks for the review. |
for multilabel how about requiring it to be None or 1?
|
@jnothman Thanks. I think you are asking me to raise an error here? Is the current version what you want? |
Yes, although I'd permit pos_label=1 too. We don't currently support
strings for multilabel and multitoutput.
…On 12 September 2017 at 00:58, Hanmin Qin ***@***.***> wrote:
@jnothman <https://github.com/jnothman> Thanks. I think you are asking me
to raise an error here? Is the current version what you want?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#9567 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz6_RABQVq-PRFFEf8gha0xQUnY2anks5shUqvgaJpZM4O4zW8>
.
|
@jnothman Thanks for your precious time:) I follow your suggestion and support pos_label = 1 for multilabel-indicator y_true (even in this situation, pos_label is not actually used). |
sklearn/metrics/ranking.py
Outdated
@@ -226,6 +227,9 @@ def roc_auc_score(y_true, y_score, average="macro", sample_weight=None): | ||
sample_weight : array-like of shape = [n_samples], optional | ||
Sample weights. | ||
|
||
pos_label : int or str, default=None | ||
The label of the positive class |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does None mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mention "for binary y_true".
roc_auc_score_3 = roc_auc_score(y_true_1, y_pred, pos_label=0) | ||
assert_almost_equal(roc_auc_score_1, 1 - roc_auc_score_3) | ||
|
||
# Test int pos_label and binary y_true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is a duplicate of above
roc_auc_score_3 = roc_auc_score(y_true_3, y_pred, pos_label='False') | ||
assert_almost_equal(roc_auc_score_1, 1 - roc_auc_score_3) | ||
|
||
# Raise an error for multilabel-indicator y_true with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything above this in the test looks like it should be handled in common tests, no?
@jnothman Thanks for the review. |
ping @jnothman |
Actually, I suspect None should mean "the greater of two values" (unless there is only one value present). You can fix the common testing of roc_auc_score here or in a new PR. |
@@ -595,7 +596,7 @@ def test_invariance_string_vs_numbers_labels(): | ||
|
||
for name, metric in THRESHOLDED_METRICS.items(): | ||
if name in ("log_loss", "hinge_loss", "unnormalized_log_loss", | ||
"brier_score_loss"): | ||
"brier_score_loss", "roc_auc_score"): | ||
# Ugly, but handle case with a pos_label and label |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure what "and label" means here. Any idea?
What metrics are we excluding here? Why do we need an explicit list of names where otherwise these lists are defined at the top with meaningful names?
I know it's not your problem, but this is a mess :\
ping @jnothman
This is the case in 'brier_score_loss' but not the case here. See _binary_clf_curve in ranking.py. Here,
The problem seems not so simple, I'll open another PR and ping you.
The comment is introduced in a798d9e. I tend to believe that it is a simple copy paste from above(CLASSIFICATION_METRICS) without actually implement it. Some is in the TODO list (See the TODO mark before METRICS_WITH_LABELS).
The excluded metrics: roc_auc_score # removed in this PR
macro_roc_auc
micro_roc_auc
samples_roc_auc
weighted_roc_auc
average_precision_score
macro_average_precision_score
micro_average_precision_score
samples_average_precision_score
weighted_average_precision_score
coverage_error
label_ranking_average_precision_score
label_ranking_loss If needed, I'll further investigate. |
ping @jnothman for the previous comment along with the PR itself. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know what you want me to do with this PR, since the change is not tested.
sklearn/metrics/ranking.py
Outdated
@@ -226,6 +227,9 @@ def roc_auc_score(y_true, y_score, average="macro", sample_weight=None): | ||
sample_weight : array-like of shape = [n_samples], optional | ||
Sample weights. | ||
|
||
pos_label : int or str, default=None | ||
The label of the positive class. Only make sense for binary y_true. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
default=None means nothing here. Don't say it. You can say "optional" then describe the default below. I think the default should be to take the greater of two class labels, or 1 if only one class is present. State that for multilabel, it is fixed to 1.
sklearn/metrics/ranking.py
Outdated
sample_weight=sample_weight) | ||
else: | ||
if pos_label is not None and pos_label != 1: | ||
raise ValueError("Parameter pos_label doesn't make sense for " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Say that it is fixed to 1 rather than doesn't make sense. It does make sense but we refuse to make it an option, I think...
@Jothman Thanks. Comments addressed.
I just want to iteratively improve the code until it seems OK :)
From my perspective, this PR mainly aims to provide what we already have on the top-level functions(e.g., roc_auc_score in ranking.py). If we want to change the behavior of pos_label=None, we will need to modify existing low-level functions (e.g., _binary_clf_curve in ranking.py). |
yeah I think that may be a bug in binary_clf_curve. The default should be
classes[-1]. Otherwise I think cv over a binary classifier with string
labels will break
…On 19 Sep 2017 11:37 pm, "Hanmin Qin" ***@***.***> wrote:
@Jothman <https://github.com/jothman> Thanks. Comments addressed.
I don't know what you want me to do with this PR, since the change is not
tested.
I just want to iteratively improve the code until it seems OK :)
Seems that all the cases have been tested. Could you please tell me what
else do I need to test? Thanks.
I think the default should be to take the greater of two class labels
From my perspective, this PR mainly aims to provide what we already have
on the top-level functions(e.g., roc_auc_score in ranking.py). If we want
to change the behavior of pos_label=None, we will need to modify existing
low-level functions (e.g., _binary_clf_curve in ranking.py).
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#9567 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz64_AKNkl4tTEOKj3W64GsBrCf18Zks5sj8OWgaJpZM4O4zW8>
.
|
Reference Issue
Finish up #6874
Fixes #6873
What does this implement/fix? Explain your changes.
improvement:
(1)add roc_auc_score to METRICS_WITH_POS_LABEL
(2)simplify the test since we can already ensure the correctness of roc_auc_score without pos_label
(3)extend the test to str pos_label and binary y_true
(4)remove meaningless support for multilabel-indicator y_true
Any other comments?