Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

FEAT enable default routing strategy machinery #30946

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
Loading
from

Conversation

adrinjalali
Copy link
Member

This creates the machinery needed to setup a default routing for metadata on the estimator level.

Copy link

github-actions bot commented Mar 5, 2025

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: fcbc431. Link to the linter CI: here

@adrinjalali adrinjalali marked this pull request as ready for review March 6, 2025 12:16
@adrinjalali
Copy link
Member Author

@ogrisel ogrisel self-requested a review March 7, 2025 19:14
@virchan virchan added the Metadata Routing all issues related to metadata routing, slep006, sample props label Mar 9, 2025
@ogrisel
Copy link
Member

ogrisel commented Mar 11, 2025

Thanks, @adrinjalali, for the PR. I opened a PR to this PR to actually define the default policy for sample_weight in BaseEstimator:

adrinjalali#5

Once we agree on this, and possibly on a little refactoring to also do it for the groups metadata of splits, I would like to follow up with another PR to this PR to actually make the default policy enabled by default when enable_metadata_routing=True and instead expose a new enabled_metadata_routing="empty_requests", enabled_metadta_routing="verbose_requests" or enabled_metadta_routing="explicit_requests" to implement the fully empty requests scheme.

This would require reworking the example further, so I prefer to decouple the two concerns to ease the review process.

@adrinjalali
Copy link
Member Author

Doing your proposed change in BaseEstimator would affect all children, including third party estimators, in a non-equal-footing way. Especially when combining this with enable_metadata_routing=True, by default all metadata in sklearn that we care about is requested, but the metadata accepted by third party libs is not.

I personally prefer to have a more "explicit" way to enable default routing than saying enable_metadata_routing=True, but if we're doing this, to keep the consistency, I see two paths forward:

  1. Request all metadata by default with enable_metadata_routing=True, which would include groups and sample_weight
  2. Enabling this default routing for objects which actually have sample_weight somewhere. For instance, for score in ClassifierMixin and RegressorMixin. It should solve our issue w/o making "smart" decisions. We can then have a FitSampleWeightRequesterMixin to add to the ones who have fit methods with sample_weight, and add that explicitly to those estimators / estimator groups (like a base class for a group of estimators).

As for groups in split objects, I intentionally didn't include that, since there's a difference between groups and sample_weight. In split objects, they raise if groups is not requested. So there's no scenario where it can be not passed by the split object works; whereas sample_weight is always optional, and all objects who accept it, work w/o it. Therefore I think it makes sense to keep groups setting as is.

@antoinebaker
Copy link
Contributor

antoinebaker commented Mar 18, 2025

I feel the current naming "default_routing" could be confusing because "default" is polysemic.
enable_metadata_routing="default_routing" means, as a request policy, to use the instance level requests as defaults. But it could be misleading to call it "default_routing", as it is not currently the default strategy, and isn't about the routing mechanics per se but more about the requests.

Maybe keeping enable_metadata_routing : bool, None as before, and adding a new config variable could help ?
Something like metadata_request_policy : str to choose among the various policies we would like to implement and test.
For instance metadata_request_policy = "empty", "explicit", "verbose" for the ones mentioned in #30946 (comment).

I am not sure how to name the policy implemented in this PR ("instance_defaults", "auto" ?) nor the old policy ("classwise_defaults" ?). But for backward compatibility we should maybe keep the old one as the default, until we are confident about the right policy and make it the new default in a future release ?

@StefanieSenger
Copy link
Contributor

StefanieSenger commented Mar 20, 2025

I feel the current naming "default_routing" could be confusing because "default" is polysemic.

I also think that's a problem.

Also the name of the new method/attribute __sklearn_default_request__ is so similar to the __metadata_request__* attributes, that it requires getting used to this part of the code to not constantly confuse them. It would be better to use "speaking names".

Just thinking aloud:

__sklearn_default_request__ could maybe be __sklearn_default_metadata_requests__ ?

__metadata_request__* could maybe be __sklearn_available_metadata_requests__*?

@@ -0,0 +1,7 @@
- |Feature| Added support for default metadata routing through
``set_config(enable_metadata_routing="default_routing")``. This enables
automatic routing of common metadata like ``sample_weight`` and ``groups`` by
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
automatic routing of common metadata like ``sample_weight`` and ``groups`` by
automatic routing of ``sample_weight`` and ``groups`` metadata (also ...) by

Can we be very explicit when communicating what exactly is included in the default routing? In the changelog and in the user guide, I would list all of them.

print_routing(clf)

# %%
# The routing can still be modified using set_*_request methods
Copy link
Contributor

@StefanieSenger StefanieSenger Mar 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# The routing can still be modified using set_*_request methods
# Users can still modify the routing using set_*_request methods

routing.

2. Before the user sets any specific routing, via `_get_default_requests`, it
provides the default metadata routing configuration for the instance. This
Copy link
Contributor

@StefanieSenger StefanieSenger Mar 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
provides the default metadata routing configuration for the instance. This
provides the available metadata routing configuration for the instance. This

Maybe mention that it provides all the available routings.

For example, if a method's signature includes `sample_weight`, this method will:
- During class creation: Create a `set_{method}_request` method to configure
how `sample_weight` should be routed
- Right after initialization: Provide the default routing configuration for
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Right after initialization: Provide the default routing configuration for
- Right after initialization: Provide the available routing configuration for

for method, method_requests in defaults.items():
for pname, value in method_requests.items():
getattr(requests, method).add_request(param=pname, alias=value)
return requests
Copy link
Contributor

@StefanieSenger StefanieSenger Mar 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of _get_default_requests() returning the class requests if _default_routing_enabled() is False?

Wouldn't it be a bug if an estimator doesn't implement their own default request and then as a result metadata is routed to all the methods where that is available in the routing?

Thinking about us then adding a new routing, and suddenly the default request would change.

Edit: Correcting myself here a bit: I have seen that this method gets called by _get_metadata_request() and of cause something needs to get passed there also if _default_routing_enabled() is False. But then I would think what gets passed is not a default request anymore. I am still concerned about the terminology.

Comment on lines +488 to +489
# Instance-level default requests can override class-level defaults
# and add new method requests
Copy link
Contributor

@StefanieSenger StefanieSenger Mar 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Instance-level default requests can override class-level defaults
# and add new method requests
# Sets default requests for the instance chosen from the available
# __metadata_request__* class attributes.

Comment on lines +1457 to +1462
- Collecting metadata request info from `__metadata_request__*` class
attributes
- Analyzing method signatures for implicit metadata parameters
The collected information is used to create `set_{method}_request` methods
(e.g. set_fit_request) that allow runtime configuration of metadata
routing.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Collecting metadata request info from `__metadata_request__*` class
attributes
- Analyzing method signatures for implicit metadata parameters
The collected information is used to create `set_{method}_request` methods
(e.g. set_fit_request) that allow runtime configuration of metadata
routing.
- Analyzing method signatures for implicit metadata parameters
The collected information is used to create `set_{method}_request` methods
(e.g. set_fit_request) that allow runtime configuration of metadata
routing.
- Collecting metadata request info from `__metadata_request__*` class
attributes

I would turn the order around to be the same as the methods does it (lowers mental load for third party developers who want to use these.

(e.g. set_fit_request) that allow runtime configuration of metadata
routing.

2. Before the user sets any specific routing, via `_get_default_requests`, it
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
2. Before the user sets any specific routing, via `_get_default_requests`, it
2. Before the user sets any specific routing, via `__sklearn_default_request__`, it

It is __sklearn_default_request__, isn't it?

assert _default_routing_enabled() == default_routing


def test_default_instance_routing_overrides_class_level():
Copy link
Contributor

@StefanieSenger StefanieSenger Mar 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is already test_setting_default_requests() in this file and maybe these tests can be either merged or more clearer separated?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Metadata Routing all issues related to metadata routing, slep006, sample props module:model_selection module:utils
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants
Morty Proxy This is a proxified and sanitized view of the page, visit original site.