-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
TST remove _required_parameters and improve instance generation #29707
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TST remove _required_parameters and improve instance generation #29707
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@glemaitre this is where I further the work on instance creation, which you also mentioned on the other PR.
@@ -430,15 +444,8 @@ def _get_check_estimator_ids(obj): | ||
return re.sub(r"\s", "", str(obj)) | ||
|
||
|
||
def _generate_column_transformer_instances(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't removed, it's now yielded beside all other estimators.
HalvingGridSearchCV: dict(cv=3), | ||
HalvingRandomSearchCV: dict(cv=3), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these two are manually set in instance generation.
@@ -1320,6 +1320,21 @@ def get_metadata_routing(self): | ||
|
||
return router | ||
|
||
def _more_tags(self): | ||
return { | ||
"_xfail_checks": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ColumnTransformer
is now tested along all other estimators,hence more checks are failing and need to be ignored, and fixed later.
@@ -379,6 +379,9 @@ def _more_tags(self): | ||
"Fail during parameter check since min/max resources requires" | ||
" more samples" | ||
), | ||
"check_estimators_nan_inf": "FIXME", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These estimators are also tested with others now, and these tests fail. Need fixes in another PR.
sklearn/pipeline.py
Outdated
@@ -1881,6 +1881,15 @@ def get_metadata_routing(self): | ||
|
||
return router | ||
|
||
def _more_tags(self): | ||
return { | ||
"_xfail_checks": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
@@ -309,6 +311,8 @@ def _estimators_that_predict_in_fit(): | ||
"estimator", column_name_estimators, ids=_get_check_estimator_ids | ||
) | ||
def test_pandas_column_name_consistency(estimator): | ||
if isinstance(estimator, ColumnTransformer): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this and test_check_param_validation
are being moved to estimator_checks
and therefore will be able to be skipped on the estimator tag, for now they need to be skiped hardcoded here. The PR moving these tests will fix this issue as well.
def check_estimator_cloneable(name, estimator_orig): | ||
"""Checks whether the estimator can be cloned.""" | ||
try: | ||
clone(estimator_orig) | ||
except Exception as e: | ||
raise AssertionError(f"Cloning of {name} failed with error: {e}.") from e | ||
|
||
|
||
def check_estimator_repr(name, estimator_orig): | ||
"""Check that the estimator has a functioning repr.""" | ||
estimator = clone(estimator_orig) | ||
try: | ||
repr(estimator) | ||
except Exception as e: | ||
raise AssertionError(f"Repr of {name} failed with error: {e}.") from e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two were inside check_parameters_default_constructible
and now moved outside in their own tests.
@@ -659,15 +659,6 @@ Even if it is not recommended, it is possible to override the method | ||
any of the keys documented above is not present in the output of `_get_tags()`, | ||
an error will occur. | ||
|
||
In addition to the tags, estimators also need to declare any non-optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now not required, and there's no need for a replacement since we now pass instances to estimator checks and not classes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Just one question for having both TEST_PARAMS
and INIT_PARAMS
. It should be possible to converge towards a single dictionary.
@@ -304,48 +316,93 @@ def _generate_pipeline(): | ||
) | ||
|
||
|
||
INIT_PARAMS = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name here is a bit confusing because we have now TEST_PARAMS
and INIT_PARAMS
and at a first glance this is not really easy to know why we have 2 dictionary.
Why the INIT_PARAM
would not be enough to run all tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't want to change much behaviour with this PR. Merging the two would mean we'd be setting the parameters according to TEST_PARAMS
in all tests, while we're not doing now. I'll see if anything fails if I merge them.
cv=2, | ||
error_score="raise", | ||
), | ||
HalvingRandomSearchCV: dict( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You probably need to tweak a bit more the parameter for this one to avoid the current failure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is odd, couldn't reproduce locally with single job, but running tests in parallel I see the failure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep this is weird, it looks like a side-effect that should change the state of the random number generator and thus the behaviour (or data).
@OmarManzoor this should be a relatively easy second review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @adrinjalali
This basically requires #29699 and #29702 to be merged first.
This PR refactors instance generation so that there is no more need for
_required_parameters
. This also means estimators are allowed to have init parameters with non-default values, which is already the case.