Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

[MRG] Adding pos_label parameter to roc_auc_score #9567

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 11 commits into from
Closed

[MRG] Adding pos_label parameter to roc_auc_score #9567

wants to merge 11 commits into from

Conversation

qinhanmin2014
Copy link
Member

@qinhanmin2014 qinhanmin2014 commented Aug 16, 2017

Reference Issue

Finish up #6874
Fixes #6873

What does this implement/fix? Explain your changes.

improvement:
(1)add roc_auc_score to METRICS_WITH_POS_LABEL
(2)simplify the test since we can already ensure the correctness of roc_auc_score without pos_label
(3)extend the test to str pos_label and binary y_true
(4)remove meaningless support for multilabel-indicator y_true

Any other comments?

@qinhanmin2014 qinhanmin2014 changed the title [WIP] Adding pos_label parameter to roc_auc_score [MRG] Adding pos_label parameter to roc_auc_score Aug 16, 2017
sample_weight=sample_weight)
return auc(fpr, tpr, reorder=True)

_partial_binary_roc_auc_score = partial(_binary_roc_auc_score,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we're using _average_binary_score here to do multi-label, right? In that case we don't need pos_label (it doesn't make sense). I would check type_of_target(y) here and if we are binary use pos_label and otherwise call _average_binary_score.

@qinhanmin2014
Copy link
Member Author

@amueller Thanks. I removed pos_label support for multilabel-indicator and related test. Is this what you want?

@qinhanmin2014
Copy link
Member Author

ping @amueller (and maybe @jnothman, @agramfort in the original pull request)
Long time no reply. If possible, please have a look at this continuation. Thanks a lot :)

_partial_binary_roc_auc_score, y_true, y_score, average,
sample_weight=sample_weight)
else:
return _average_binary_score(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What should we do with pos_label in this case?? Use it? Raise an error if it is set?

@@ -257,6 +257,7 @@

"macro_f0.5_score", "macro_f1_score", "macro_f2_score",
"macro_precision_score", "macro_recall_score",
"roc_auc_score",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this affects any common tests. There are two uses of METRICS_WITH_POS_LABEL. One does not apply, I think; the other I suspect should apply but is being explicitly limited to a handful of metrics in a way that is very ugly and IMO inappropriate...

(Although I'm also confused why roc_auc_score is in METRICS_UNDEFINED_BINARY rather than METRICS_UNDEFINED_MULTICLASS.)

I'd appreciate if you could do a bit of an audit of the common tests wrt roc_auc_score and related metrics.

roc_auc_score_2 = roc_auc_score(y_true_1, y_pred, pos_label=1)
assert_almost_equal(roc_auc_score_1, roc_auc_score_2)
roc_auc_score_3 = roc_auc_score(y_true_1, y_pred, pos_label=0)
assert_almost_equal(roc_auc_score_1, 1-roc_auc_score_3)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spaces around -, please.

PEP8 only recently allowed not having spaces around binary operators, but spaces should only be removed when the grouping is visually helpful, in contrast to other operators with spaces (e.g. a*b + c). It is not, here.

@qinhanmin2014
Copy link
Member Author

@jnothman Thanks for the review.
(1)I previously support roc_auc_score for multilabel-indicator y_true, but @amueller think it is inappropriate. Currently, I ignore the parameter and are waiting for the reply from the community. We can either support multilabel-indicator y_true, raise a warning or raise an error. Which one would you prefer?
(2)Indeed, simply adding roc_auc_score in METRICS_WITH_POS_LABEL dose not affect anything. In fact, I have added the test about pos_label in test_ranking.py. I'll try to look further into test_common.py (e.g., METRICS_UNDEFINED_BINARY) and will reply soon.

@jnothman
Copy link
Member

jnothman commented Sep 11, 2017 via email

@qinhanmin2014
Copy link
Member Author

@jnothman Thanks. I think you are asking me to raise an error here? Is the current version what you want?

@jnothman
Copy link
Member

jnothman commented Sep 11, 2017 via email

@qinhanmin2014
Copy link
Member Author

@jnothman Thanks for your precious time:) I follow your suggestion and support pos_label = 1 for multilabel-indicator y_true (even in this situation, pos_label is not actually used).

@@ -226,6 +227,9 @@ def roc_auc_score(y_true, y_score, average="macro", sample_weight=None):
sample_weight : array-like of shape = [n_samples], optional
Sample weights.

pos_label : int or str, default=None
The label of the positive class
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does None mean?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mention "for binary y_true".

roc_auc_score_3 = roc_auc_score(y_true_1, y_pred, pos_label=0)
assert_almost_equal(roc_auc_score_1, 1 - roc_auc_score_3)

# Test int pos_label and binary y_true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is a duplicate of above

roc_auc_score_3 = roc_auc_score(y_true_3, y_pred, pos_label='False')
assert_almost_equal(roc_auc_score_1, 1 - roc_auc_score_3)

# Raise an error for multilabel-indicator y_true with
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything above this in the test looks like it should be handled in common tests, no?

@qinhanmin2014
Copy link
Member Author

@jnothman Thanks for the review.
(1)The default value of pos_label is copied from _binary_clf_curve, the function roc_auc_score based on. None seems equivalent to 1 in such situation. Currently in scikit-learn, some functions use None as the default value of pos_label(e.g., brier_score_loss, precision_recall_curve) while others use 1 as the default value(e.g., f1_score, precision_score).
(2)Indeed, I think I need to consider the position of the test so kindly give me some time. Will ping soon :)

@qinhanmin2014
Copy link
Member Author

ping @jnothman
Thanks for the review. Here are some discoveries.
(1)The default value of pos_label is copied from _binary_clf_curve, the function roc_auc_score based on. None seems equivalent to 1 in such situation. Currently in scikit-learn, some functions use None as the default value of pos_label(e.g., brier_score_loss, precision_recall_curve) while others use 1 as the default value(e.g., f1_score, precision_score).
(2)I move some test to test_common according to your instruction.
(3)I also agree that roc_auc_score should be in METRICS_UNDEFINED_MULTICLASS instead of METRICS_UNDEFINED_BINARY. But currently roc_auc_score can't pass the common tests. For example, roc_auc_score will not raise an error even if the shape of sample_weight is not [n_samples].
roc_auc_score([1,0,1], [0,0,1], sample_weight=[1,2,3]) #0.875
roc_auc_score([1,0,1], [0,0,1], sample_weight=[1,2,3,4]) #0.875
If you think it worth to fix, I'll open another pull request for it.

@jnothman
Copy link
Member

Actually, I suspect None should mean "the greater of two values" (unless there is only one value present).

You can fix the common testing of roc_auc_score here or in a new PR.

@@ -595,7 +596,7 @@ def test_invariance_string_vs_numbers_labels():

for name, metric in THRESHOLDED_METRICS.items():
if name in ("log_loss", "hinge_loss", "unnormalized_log_loss",
"brier_score_loss"):
"brier_score_loss", "roc_auc_score"):
# Ugly, but handle case with a pos_label and label
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure what "and label" means here. Any idea?

What metrics are we excluding here? Why do we need an explicit list of names where otherwise these lists are defined at the top with meaningful names?

I know it's not your problem, but this is a mess :\

@qinhanmin2014
Copy link
Member Author

ping @jnothman

Actually, I suspect None should mean "the greater of two values" (unless there is only one value present).

This is the case in 'brier_score_loss' but not the case here. See _binary_clf_curve in ranking.py. Here, pos_label=None means that we only accept {-1, 1} / {0, 1} y_true and 1 is set to pos_label. Otherwise, we will raise an error (Data is not binary and pos_label is not specified).

You can fix the common testing of roc_auc_score here or in a new PR.

The problem seems not so simple, I'll open another PR and ping you.

not sure what "and label" means here. Any idea?

The comment is introduced in a798d9e. I tend to believe that it is a simple copy paste from above(CLASSIFICATION_METRICS) without actually implement it. Some is in the TODO list (See the TODO mark before METRICS_WITH_LABELS).

What metrics are we excluding here? Why do we need an explicit list of names where otherwise these lists are defined at the top with meaningful names?

The excluded metrics:

roc_auc_score # removed in this PR
macro_roc_auc
micro_roc_auc
samples_roc_auc
weighted_roc_auc

average_precision_score
macro_average_precision_score
micro_average_precision_score
samples_average_precision_score
weighted_average_precision_score

coverage_error
label_ranking_average_precision_score
label_ranking_loss

If needed, I'll further investigate.

@qinhanmin2014
Copy link
Member Author

ping @jnothman for the previous comment along with the PR itself.
Could you please spare some time to take care of this PR when you have time? Thanks.

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what you want me to do with this PR, since the change is not tested.

@@ -226,6 +227,9 @@ def roc_auc_score(y_true, y_score, average="macro", sample_weight=None):
sample_weight : array-like of shape = [n_samples], optional
Sample weights.

pos_label : int or str, default=None
The label of the positive class. Only make sense for binary y_true.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default=None means nothing here. Don't say it. You can say "optional" then describe the default below. I think the default should be to take the greater of two class labels, or 1 if only one class is present. State that for multilabel, it is fixed to 1.

sample_weight=sample_weight)
else:
if pos_label is not None and pos_label != 1:
raise ValueError("Parameter pos_label doesn't make sense for "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Say that it is fixed to 1 rather than doesn't make sense. It does make sense but we refuse to make it an option, I think...

@qinhanmin2014
Copy link
Member Author

@Jothman Thanks. Comments addressed.

I don't know what you want me to do with this PR, since the change is not tested.

I just want to iteratively improve the code until it seems OK :)
Seems that all the cases have been tested. Could you please tell me what else do I need to test? Thanks.

I think the default should be to take the greater of two class labels

From my perspective, this PR mainly aims to provide what we already have on the top-level functions(e.g., roc_auc_score in ranking.py). If we want to change the behavior of pos_label=None, we will need to modify existing low-level functions (e.g., _binary_clf_curve in ranking.py).

@jnothman
Copy link
Member

jnothman commented Sep 19, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

roc_auc_score doesn't have a pos_class attribute
3 participants
Morty Proxy This is a proxified and sanitized view of the page, visit original site.