Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

TST remove _required_parameters and improve instance generation #29707

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 23 commits into from
Sep 6, 2024

Conversation

adrinjalali
Copy link
Member

This basically requires #29699 and #29702 to be merged first.

This PR refactors instance generation so that there is no more need for _required_parameters. This also means estimators are allowed to have init parameters with non-default values, which is already the case.

Copy link

github-actions bot commented Aug 23, 2024

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 4a99a3f. Link to the linter CI: here

@adrinjalali adrinjalali added No Changelog Needed Developer API Third party developer API related labels Aug 26, 2024
Copy link
Member Author

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@glemaitre this is where I further the work on instance creation, which you also mentioned on the other PR.

@@ -430,15 +444,8 @@ def _get_check_estimator_ids(obj):
return re.sub(r"\s", "", str(obj))


def _generate_column_transformer_instances():
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't removed, it's now yielded beside all other estimators.

Comment on lines -191 to -192
HalvingGridSearchCV: dict(cv=3),
HalvingRandomSearchCV: dict(cv=3),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these two are manually set in instance generation.

@@ -1320,6 +1320,21 @@ def get_metadata_routing(self):

return router

def _more_tags(self):
return {
"_xfail_checks": {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ColumnTransformer is now tested along all other estimators,hence more checks are failing and need to be ignored, and fixed later.

@@ -379,6 +379,9 @@ def _more_tags(self):
"Fail during parameter check since min/max resources requires"
" more samples"
),
"check_estimators_nan_inf": "FIXME",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These estimators are also tested with others now, and these tests fail. Need fixes in another PR.

@@ -1881,6 +1881,15 @@ def get_metadata_routing(self):

return router

def _more_tags(self):
return {
"_xfail_checks": {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

@@ -309,6 +311,8 @@ def _estimators_that_predict_in_fit():
"estimator", column_name_estimators, ids=_get_check_estimator_ids
)
def test_pandas_column_name_consistency(estimator):
if isinstance(estimator, ColumnTransformer):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this and test_check_param_validation are being moved to estimator_checks and therefore will be able to be skipped on the estimator tag, for now they need to be skiped hardcoded here. The PR moving these tests will fix this issue as well.

Comment on lines +3259 to +3273
def check_estimator_cloneable(name, estimator_orig):
"""Checks whether the estimator can be cloned."""
try:
clone(estimator_orig)
except Exception as e:
raise AssertionError(f"Cloning of {name} failed with error: {e}.") from e


def check_estimator_repr(name, estimator_orig):
"""Check that the estimator has a functioning repr."""
estimator = clone(estimator_orig)
try:
repr(estimator)
except Exception as e:
raise AssertionError(f"Repr of {name} failed with error: {e}.") from e
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two were inside check_parameters_default_constructible and now moved outside in their own tests.

@@ -659,15 +659,6 @@ Even if it is not recommended, it is possible to override the method
any of the keys documented above is not present in the output of `_get_tags()`,
an error will occur.

In addition to the tags, estimators also need to declare any non-optional
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now not required, and there's no need for a replacement since we now pass instances to estimator checks and not classes.

@adrinjalali adrinjalali marked this pull request as ready for review September 4, 2024 13:50
@glemaitre glemaitre changed the title [WIP] TST remove _required_parameters and improve instance generation TST remove _required_parameters and improve instance generation Sep 4, 2024
@glemaitre glemaitre self-requested a review September 4, 2024 14:36
Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Just one question for having both TEST_PARAMS and INIT_PARAMS. It should be possible to converge towards a single dictionary.

sklearn/base.py Show resolved Hide resolved
sklearn/tests/test_common.py Outdated Show resolved Hide resolved
@@ -304,48 +316,93 @@ def _generate_pipeline():
)


INIT_PARAMS = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name here is a bit confusing because we have now TEST_PARAMS and INIT_PARAMS and at a first glance this is not really easy to know why we have 2 dictionary.

Why the INIT_PARAM would not be enough to run all tests?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't want to change much behaviour with this PR. Merging the two would mean we'd be setting the parameters according to TEST_PARAMS in all tests, while we're not doing now. I'll see if anything fails if I merge them.

cv=2,
error_score="raise",
),
HalvingRandomSearchCV: dict(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably need to tweak a bit more the parameter for this one to avoid the current failure.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is odd, couldn't reproduce locally with single job, but running tests in parallel I see the failure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep this is weird, it looks like a side-effect that should change the state of the random number generator and thus the behaviour (or data).

@adrinjalali
Copy link
Member Author

@OmarManzoor this should be a relatively easy second review.

Copy link
Contributor

@OmarManzoor OmarManzoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @adrinjalali

sklearn/utils/_test_common/instance_generator.py Outdated Show resolved Hide resolved
sklearn/utils/tests/test_estimator_checks.py Outdated Show resolved Hide resolved
@OmarManzoor OmarManzoor enabled auto-merge (squash) September 6, 2024 06:15
@OmarManzoor OmarManzoor merged commit 95e9459 into scikit-learn:main Sep 6, 2024
28 checks passed
@adrinjalali adrinjalali deleted the test/required_parameters branch September 6, 2024 07:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Developer API Third party developer API related No Changelog Needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
Morty Proxy This is a proxified and sanitized view of the page, visit original site.