-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
FIX LogisticRegressionCV.score
and _BaseScorer
metadata routing
#30859
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
How did you discover these bugs? Edit: he it was related to the issue, sorry didn't see in time. |
err_msg = re.escape( | ||
"[sample_weight] are passed but are not explicitly set as requested or not" | ||
" requested for _Scorer.score, which is used within test.score. Call" | ||
" `_Scorer.set_score_request({metadata}=True/False)` for each" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can replace _Scorer
with __repr__
once #30946 is merged.
@@ -1762,6 +1764,8 @@ class LogisticRegressionCV(LogisticRegression, LinearClassifierMixin, BaseEstima | ||
0.98... | ||
""" | ||
|
||
# TODO(1.9): remove this when sample_weight is removed from the `score` signature | ||
__metadata_request__score = {"sample_weight": metadata_routing.UNUSED} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is to avoid set_score_request
be present on this class
@@ -2262,18 +2262,18 @@ def test_lr_cv_scores_differ_when_sample_weight_is_requested(): | ||
sample_weight[: len(y) // 2] = 2 | ||
kwargs = {"sample_weight": sample_weight} | ||
|
||
scorer1 = get_scorer("accuracy") | ||
scorer1 = get_scorer("accuracy").set_score_request(sample_weight=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now that scorers properly request their routing, this is required since we're passing sample weight bellow.
score_method=self._score_func, | ||
ignore_params={"y_true", "y_pred"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we explicitly pass the _score_func
so that the right metadata can be deducted from its signature.
And we need to ignore y_true
and y_pred
in that process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to ignore y_prob
, y_proba
, y_score
, labels_true
, labels_pred
, pred_decision
as well ? (Some of the various names the first two args of a score function can have). Maybe an easier way to skip them all would be to ignore the first two positional arguments of the score function ?
@OmarManzoor @antoinebaker this is ready for review now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR @adrinjalali
# TODO(1.9): remove this when sample_weight is removed from the `score` signature | ||
# and remove `sample_weight` from the `score` signature |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to make it clear
# TODO(1.9): remove this when sample_weight is removed from the `score` signature | |
# and remove `sample_weight` from the `score` signature | |
# TODO(1.9): remove this decorator along with `sample_weight` from the `score` | |
# signature |
@@ -2231,13 +2242,14 @@ def score(self, X, y, sample_weight=None, **score_params): | ||
Score of self.predict(X) w.r.t. y. | ||
""" | ||
_raise_for_params(score_params, self, "score") | ||
if sample_weight is not None: | ||
score_params["sample_weight"] = sample_weight |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we intend on removing sample_weight
then shouldn't we also update the condition where routing is not enabled to be:
if "sample_weight " in score_params:
instead of if sample_weight is not None:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A first round of reviews, but from my limited understanding of the metadata routing API, I probably don't get all the logic right 😉
score_method=self._score_func, | ||
ignore_params={"y_true", "y_pred"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to ignore y_prob
, y_proba
, y_score
, labels_true
, labels_pred
, pred_decision
as well ? (Some of the various names the first two args of a score function can have). Maybe an easier way to skip them all would be to ignore the first two positional arguments of the score function ?
# sample_weight cannot be passed to lr_cv1.score since it's unrequested. | ||
score_1 = lr_cv1.score(X_t, y_t) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could test here that lr_cv1.score(X_t, y_t, **kwargs)
raises the appropriate error.
cls._build_request_for_signature( | ||
method_name=method, | ||
method_obj=score_method if score_method != "score" else None, | ||
ignore_params=ignore_params, | ||
), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity, why the generic _get_default_requests
and _build_request_for_signature
need to be modified specifically for scorers ? Is it because in general we rely on the method signature, but here for scorers we instead rely on the scorer._scorer_func
signature ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If so, I am wondering if we should rather redefine _get_default_requests
and _build_request_for_signature
in _BaseScorer
(instead of changing the _MetadataRequester
mixin).
# TODO (1.9): remove in 1.9 | ||
@_deprecate_positional_args(version="1.9") | ||
def __call__(self, estimator, X, y_true, *, sample_weight=None, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity, why do we need to deprecate sample_weight
as a positional arg ? Is it directly related to this PR or is it a general guideline ?
Fixes #30817
Two issues fixed in this PR:
LogisticRegressionCV
had asample_weight
arg in itsscore
, which makes it a consumer of it, while being a router. This PR removessample_weight
as a consumer arg from that method_BaseScorer
wasn't implementing aget_metadata_routing
and as a result the default implementation wasn't correctly detectingsample_weight
in the__call__
signature of those scorersNeeds tests and refining the error message regarding
scorer.__call__
cc @Dalesrox, @antoinebaker