Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

MNT Improve error check_array error message when estimator is None #30485

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Dec 20, 2024

Conversation

taharallouche
Copy link
Contributor

@taharallouche taharallouche commented Dec 14, 2024

Reference Issues/PRs

I haven't found an issue that is dedicated to this.

What does this implement/fix? Explain your changes.

Hello, this is an attempt to make the error message that check_array emits when X.ndim > 2 in cases where allow_nd is False, and estimator is None.

Here's the current (on scikit-learn's main branch 6cccd99) error message:

X = np.array([[[1,2],[1,2]],[[1,2],[1,2]]])
check_array(X, allow_nd=False)

# ValueError: Found array with dim 3. None expected <= 2.

Here's the output on this branch:

X = np.array([[[1,2],[1,2]],[[1,2],[1,2]]])
check_array(X, allow_nd=False)

# ValueError: Found array with dim 3. Expected <= 2.

The case where estimator is not None is unchanged.

Copy link

github-actions bot commented Dec 14, 2024

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 5d1dc4e. Link to the linter CI: here

Copy link
Contributor

@StefanieSenger StefanieSenger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @taharallouche,

thanks a lot for your PR. I like your initiative to make this error message clearer. 😃
I'm not a maintainer, but I am leaving you some comments to work with.

Comment on lines 1099 to 1104
if estimator_name is not None:
raise ValueError(
"Found array with dim %d. %s expected <= 2."
% (array.ndim, estimator_name)
)
raise ValueError("Found array with dim %d. Expected <= 2." % (array.ndim))
Copy link
Contributor

@StefanieSenger StefanieSenger Dec 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we already have something in place a few lines up where we define:

    estimator_name = _check_estimator_name(estimator)
    context = " by %s" % estimator_name if estimator is not None else ""

What about using the context variable in the error message?
The advantage would be not to require an additional conditional check.

Apart from this (and disregarding how we did this in the past), it would be nice to use f-strings here.

],
)
def test_check_array_allow_nd_errors(X, estimator, expectation) -> None:
with expectation:
Copy link
Contributor

@StefanieSenger StefanieSenger Dec 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We usually use the with pytest.raises context manager directly in the test. For consistence, would you mind doing this here, as well?

),
],
)
def test_check_array_allow_nd_errors(X, estimator, expectation) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to put type hints here.

Comment on lines 2383 to 2387
(
np.array([[1, 2], [3, 4]]),
None,
does_not_raise(),
),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The does_not_raise for data with matching shapes is implicitly tested in the other tests, so I would think it is not necessary here.

@taharallouche
Copy link
Contributor Author

Thanks for taking the time to review this @StefanieSenger 🙏 I addressed all your suggestions

@StefanieSenger StefanieSenger added the Quick Review For PRs that are quick to review label Dec 18, 2024
Copy link
Contributor

@StefanieSenger StefanieSenger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @taharallouche. I think it looks great.

Two maintainers need to approve this PR.
I have set a quick review label, so it gets people's attention in this busy time.

I believe there is no changelog necessary.

@lesteve lesteve changed the title fix: check array error message if estimator is none MNT Improve error check_array error_message when allow_ndarray=False and estimator is None Dec 20, 2024
@lesteve lesteve changed the title MNT Improve error check_array error_message when allow_ndarray=False and estimator is None MNT Improve error check_array error_message when estimator is None Dec 20, 2024
@lesteve lesteve changed the title MNT Improve error check_array error_message when estimator is None MNT Improve error check_array error message when estimator is None Dec 20, 2024
sklearn/utils/tests/test_validation.py Outdated Show resolved Hide resolved
@lesteve
Copy link
Member

lesteve commented Dec 20, 2024

Thanks a lot for the improvement!

I set-up automerge so this PR will be merged when CI is green.

Side-comment: in general you don't need to merge the main branch into your PR branch. One case when you need to do it is when there are conflicts.

@lesteve lesteve merged commit 72b35a4 into scikit-learn:main Dec 20, 2024
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
Morty Proxy This is a proxified and sanitized view of the page, visit original site.