[MRG] Adding pos_label parameter to roc_auc_score #9567

qinhanmin2014 · Aug 16, 2017

Reference Issue

Finish up #6874
Fixes #6873

What does this implement/fix? Explain your changes.

improvement:
(1)add roc_auc_score to METRICS_WITH_POS_LABEL
(2)simplify the test since we can already ensure the correctness of roc_auc_score without pos_label
(3)extend the test to str pos_label and binary y_true
(4)remove meaningless support for multilabel-indicator y_true

Any other comments?

amueller · Aug 16, 2017

sklearn/metrics/ranking.py

                                        sample_weight=sample_weight)
        return auc(fpr, tpr, reorder=True)

+    _partial_binary_roc_auc_score = partial(_binary_roc_auc_score,


we're using _average_binary_score here to do multi-label, right? In that case we don't need pos_label (it doesn't make sense). I would check type_of_target(y) here and if we are binary use pos_label and otherwise call _average_binary_score.

qinhanmin2014 · Aug 17, 2017

@amueller Thanks. I removed pos_label support for multilabel-indicator and related test. Is this what you want?

qinhanmin2014 · Sep 8, 2017

ping @amueller (and maybe @jnothman, @agramfort in the original pull request)
Long time no reply. If possible, please have a look at this continuation. Thanks a lot :)

jnothman · Sep 11, 2017

sklearn/metrics/ranking.py

+            _partial_binary_roc_auc_score, y_true, y_score, average,
+            sample_weight=sample_weight)
+    else:
+        return _average_binary_score(


What should we do with pos_label in this case?? Use it? Raise an error if it is set?

jnothman · Sep 11, 2017

sklearn/metrics/tests/test_common.py

@@ -257,6 +257,7 @@

    "macro_f0.5_score", "macro_f1_score", "macro_f2_score",
    "macro_precision_score", "macro_recall_score",
+    "roc_auc_score",


I don't think this affects any common tests. There are two uses of METRICS_WITH_POS_LABEL. One does not apply, I think; the other I suspect should apply but is being explicitly limited to a handful of metrics in a way that is very ugly and IMO inappropriate...

(Although I'm also confused why roc_auc_score is in METRICS_UNDEFINED_BINARY rather than METRICS_UNDEFINED_MULTICLASS.)

I'd appreciate if you could do a bit of an audit of the common tests wrt roc_auc_score and related metrics.

jnothman · Sep 11, 2017

sklearn/metrics/tests/test_ranking.py

+    roc_auc_score_2 = roc_auc_score(y_true_1, y_pred, pos_label=1)
+    assert_almost_equal(roc_auc_score_1, roc_auc_score_2)
+    roc_auc_score_3 = roc_auc_score(y_true_1, y_pred, pos_label=0)
+    assert_almost_equal(roc_auc_score_1, 1-roc_auc_score_3)


spaces around -, please.

PEP8 only recently allowed not having spaces around binary operators, but spaces should only be removed when the grouping is visually helpful, in contrast to other operators with spaces (e.g. a*b + c). It is not, here.

qinhanmin2014 · Sep 11, 2017

@jnothman Thanks for the review.
(1)I previously support roc_auc_score for multilabel-indicator y_true, but @amueller think it is inappropriate. Currently, I ignore the parameter and are waiting for the reply from the community. We can either support multilabel-indicator y_true, raise a warning or raise an error. Which one would you prefer?
(2)Indeed, simply adding roc_auc_score in METRICS_WITH_POS_LABEL dose not affect anything. In fact, I have added the test about pos_label in test_ranking.py. I'll try to look further into test_common.py (e.g., METRICS_UNDEFINED_BINARY) and will reply soon.

jnothman · Sep 11, 2017

for multilabel how about requiring it to be None or 1?

qinhanmin2014 · Sep 11, 2017

@jnothman Thanks. I think you are asking me to raise an error here? Is the current version what you want?

jnothman · Sep 11, 2017

Yes, although I'd permit pos_label=1 too. We don't currently support strings for multilabel and multitoutput.

…

On 12 September 2017 at 00:58, Hanmin Qin ***@***.***> wrote: @jnothman <https://github.com/jnothman> Thanks. I think you are asking me to raise an error here? Is the current version what you want? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#9567 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz6_RABQVq-PRFFEf8gha0xQUnY2anks5shUqvgaJpZM4O4zW8> .

qinhanmin2014 · Sep 12, 2017

@jnothman Thanks for your precious time:) I follow your suggestion and support pos_label = 1 for multilabel-indicator y_true (even in this situation, pos_label is not actually used).

jnothman · Sep 12, 2017

sklearn/metrics/ranking.py

@@ -226,6 +227,9 @@ def roc_auc_score(y_true, y_score, average="macro", sample_weight=None):
    sample_weight : array-like of shape = [n_samples], optional
        Sample weights.

+    pos_label : int or str, default=None
+        The label of the positive class


What does None mean?

Mention "for binary y_true".

jnothman · Sep 12, 2017

sklearn/metrics/tests/test_ranking.py

+    roc_auc_score_3 = roc_auc_score(y_true_1, y_pred, pos_label=0)
+    assert_almost_equal(roc_auc_score_1, 1 - roc_auc_score_3)
+
+    # Test int pos_label and binary y_true


This comment is a duplicate of above

jnothman · Sep 12, 2017

sklearn/metrics/tests/test_ranking.py

+    roc_auc_score_3 = roc_auc_score(y_true_3, y_pred, pos_label='False')
+    assert_almost_equal(roc_auc_score_1, 1 - roc_auc_score_3)
+
+    # Raise an error for multilabel-indicator y_true with


Everything above this in the test looks like it should be handled in common tests, no?

qinhanmin2014 · Sep 12, 2017

@jnothman Thanks for the review.
(1)The default value of pos_label is copied from _binary_clf_curve, the function roc_auc_score based on. None seems equivalent to 1 in such situation. Currently in scikit-learn, some functions use None as the default value of pos_label(e.g., brier_score_loss, precision_recall_curve) while others use 1 as the default value(e.g., f1_score, precision_score).
(2)Indeed, I think I need to consider the position of the test so kindly give me some time. Will ping soon :)

qinhanmin2014 · Sep 13, 2017

ping @jnothman
Thanks for the review. Here are some discoveries.
(1)The default value of pos_label is copied from _binary_clf_curve, the function roc_auc_score based on. None seems equivalent to 1 in such situation. Currently in scikit-learn, some functions use None as the default value of pos_label(e.g., brier_score_loss, precision_recall_curve) while others use 1 as the default value(e.g., f1_score, precision_score).
(2)I move some test to test_common according to your instruction.
(3)I also agree that roc_auc_score should be in METRICS_UNDEFINED_MULTICLASS instead of METRICS_UNDEFINED_BINARY. But currently roc_auc_score can't pass the common tests. For example, roc_auc_score will not raise an error even if the shape of sample_weight is not [n_samples].
roc_auc_score([1,0,1], [0,0,1], sample_weight=[1,2,3]) #0.875
roc_auc_score([1,0,1], [0,0,1], sample_weight=[1,2,3,4]) #0.875
If you think it worth to fix, I'll open another pull request for it.

jnothman · Sep 13, 2017

Actually, I suspect None should mean "the greater of two values" (unless there is only one value present).

You can fix the common testing of roc_auc_score here or in a new PR.

jnothman · Sep 13, 2017

sklearn/metrics/tests/test_common.py

@@ -595,7 +596,7 @@ def test_invariance_string_vs_numbers_labels():

    for name, metric in THRESHOLDED_METRICS.items():
        if name in ("log_loss", "hinge_loss", "unnormalized_log_loss",
-                    "brier_score_loss"):
+                    "brier_score_loss", "roc_auc_score"):
            # Ugly, but handle case with a pos_label and label


not sure what "and label" means here. Any idea?

What metrics are we excluding here? Why do we need an explicit list of names where otherwise these lists are defined at the top with meaningful names?

I know it's not your problem, but this is a mess :\

qinhanmin2014 · Sep 14, 2017

ping @jnothman

Actually, I suspect None should mean "the greater of two values" (unless there is only one value present).

This is the case in 'brier_score_loss' but not the case here. See _binary_clf_curve in ranking.py. Here, pos_label=None means that we only accept {-1, 1} / {0, 1} y_true and 1 is set to pos_label. Otherwise, we will raise an error (Data is not binary and pos_label is not specified).

You can fix the common testing of roc_auc_score here or in a new PR.

The problem seems not so simple, I'll open another PR and ping you.

not sure what "and label" means here. Any idea?

The comment is introduced in a798d9e. I tend to believe that it is a simple copy paste from above(CLASSIFICATION_METRICS) without actually implement it. Some is in the TODO list (See the TODO mark before METRICS_WITH_LABELS).

What metrics are we excluding here? Why do we need an explicit list of names where otherwise these lists are defined at the top with meaningful names?

The excluded metrics:

roc_auc_score # removed in this PR
macro_roc_auc
micro_roc_auc
samples_roc_auc
weighted_roc_auc

average_precision_score
macro_average_precision_score
micro_average_precision_score
samples_average_precision_score
weighted_average_precision_score

coverage_error
label_ranking_average_precision_score
label_ranking_loss

If needed, I'll further investigate.

qinhanmin2014 · Sep 18, 2017

ping @jnothman for the previous comment along with the PR itself.
Could you please spare some time to take care of this PR when you have time? Thanks.

jnothman

I don't know what you want me to do with this PR, since the change is not tested.

jnothman · Sep 18, 2017

sklearn/metrics/ranking.py

@@ -226,6 +227,9 @@ def roc_auc_score(y_true, y_score, average="macro", sample_weight=None):
    sample_weight : array-like of shape = [n_samples], optional
        Sample weights.

+    pos_label : int or str, default=None
+        The label of the positive class. Only make sense for binary y_true.


default=None means nothing here. Don't say it. You can say "optional" then describe the default below. I think the default should be to take the greater of two class labels, or 1 if only one class is present. State that for multilabel, it is fixed to 1.

jnothman · Sep 18, 2017

sklearn/metrics/ranking.py

+            sample_weight=sample_weight)
+    else:
+        if pos_label is not None and pos_label != 1:
+            raise ValueError("Parameter pos_label doesn't make sense for "


Say that it is fixed to 1 rather than doesn't make sense. It does make sense but we refuse to make it an option, I think...

qinhanmin2014 · Sep 19, 2017

@Jothman Thanks. Comments addressed.

I don't know what you want me to do with this PR, since the change is not tested.

I just want to iteratively improve the code until it seems OK :)
Seems that all the cases have been tested. Could you please tell me what else do I need to test? Thanks.

I think the default should be to take the greater of two class labels

From my perspective, this PR mainly aims to provide what we already have on the top-level functions(e.g., roc_auc_score in ranking.py). If we want to change the behavior of pos_label=None, we will need to modify existing low-level functions (e.g., _binary_clf_curve in ranking.py).

jnothman · Sep 19, 2017

yeah I think that may be a bug in binary_clf_curve. The default should be classes[-1]. Otherwise I think cv over a binary classifier with string labels will break

…

On 19 Sep 2017 11:37 pm, "Hanmin Qin" ***@***.***> wrote: @Jothman <https://github.com/jothman> Thanks. Comments addressed. I don't know what you want me to do with this PR, since the change is not tested. I just want to iteratively improve the code until it seems OK :) Seems that all the cases have been tested. Could you please tell me what else do I need to test? Thanks. I think the default should be to take the greater of two class labels From my perspective, this PR mainly aims to provide what we already have on the top-level functions(e.g., roc_auc_score in ranking.py). If we want to change the behavior of pos_label=None, we will need to modify existing low-level functions (e.g., _binary_clf_curve in ranking.py). — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#9567 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz64_AKNkl4tTEOKj3W64GsBrCf18Zks5sj8OWgaJpZM4O4zW8> .

qinhanmin2014 added 3 commits August 16, 2017 19:44

add pos_label parameter

9312a17

add pos_label parameter

7e2b756

minor update

e8226ac

qinhanmin2014 changed the title ~~[WIP] Adding pos_label parameter to roc_auc_score~~ [MRG] Adding pos_label parameter to roc_auc_score Aug 16, 2017

amueller reviewed Aug 16, 2017

View reviewed changes

remove support for multilabel-indicator

ed34f9e

qinhanmin2014 mentioned this pull request Aug 26, 2017

[MRG] Adding pos_label parameter to roc_auc_score (#6873) #6874

Closed

jnothman reviewed Sep 11, 2017

View reviewed changes

pep fix

d362222

raise error

4f1b98c

support pos_label = 1

a049a8b

jnothman reviewed Sep 12, 2017

View reviewed changes

minor update

2e5037f

move test to test_common

1c0e336

jnothman reviewed Sep 13, 2017

View reviewed changes

qinhanmin2014 mentioned this pull request Sep 16, 2017

[MRG+1] TST Move roc_auc_score from METRIC_UNDEFINED_BINARY to METRIC_UNDEFINED_MULTICLASS #9786

Merged

jnothman reviewed Sep 18, 2017

View reviewed changes

minor improve

644fb0a

qinhanmin2014 mentioned this pull request Sep 20, 2017

roc_auc_score should be calculated regardless of classification label #9805

Closed

Merge branch 'master' into my-feature-4

a36fe3a

jnothman closed this Sep 20, 2017

This was referenced Sep 25, 2017

[MRG+1] Completely support binary y_true in roc_auc_score #9828

Merged

TST possible improvements of metrics/tests/test_common #9829

Closed

qinhanmin2014 deleted the my-feature-4 branch October 11, 2017 11:05

qinhanmin2014 mentioned this pull request Apr 4, 2018

Wrong AUC from roc_auc_score #10914

Closed

Search code, repositories, users, issues, pull requests...

Uh oh!

[MRG] Adding pos_label parameter to roc_auc_score #9567

[MRG] Adding pos_label parameter to roc_auc_score #9567

Uh oh!

Conversation

qinhanmin2014 commented Aug 16, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issue

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

qinhanmin2014 commented Aug 17, 2017

Uh oh!

qinhanmin2014 commented Sep 8, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

qinhanmin2014 commented Sep 11, 2017

Uh oh!

jnothman commented Sep 11, 2017 via email

Uh oh!

qinhanmin2014 commented Sep 11, 2017

Uh oh!

jnothman commented Sep 11, 2017 via email

Uh oh!

qinhanmin2014 commented Sep 12, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

qinhanmin2014 commented Sep 12, 2017

Uh oh!

qinhanmin2014 commented Sep 13, 2017

Uh oh!

jnothman commented Sep 13, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

qinhanmin2014 commented Sep 14, 2017

Uh oh!

qinhanmin2014 commented Sep 18, 2017

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

qinhanmin2014 commented Sep 19, 2017

Uh oh!

jnothman commented Sep 19, 2017 via email

Uh oh!

Uh oh!

qinhanmin2014 commented Aug 16, 2017 •

edited

Loading