The Wayback Machine - https://web.archive.org/web/20221127024428/https://github.com/scikit-learn/scikit-learn/pull/14369
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nicer error in num_samples if shape is not valid and there's no __len__ #14369

Merged
merged 8 commits into from Jul 23, 2019

Conversation

amueller
Copy link
Member

@amueller amueller commented Jul 14, 2019

This refactors _num_samples to raise a nice error message if X has no len.

This happens apparently with tensorflow symbolic tensors, for example. But generally also seems nicer.

raise TypeError('Expected sequence or array-like, got '
'estimator %s' % x)
raise TypeError(message)

Copy link
Member

@jeremiedbb jeremiedbb Jul 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some tests of functions relying on _num_samples internally check that 'estimator' is in the error message.
We can either change the expected message in those tests or keep the special case when x is an estimator.

Copy link
Contributor

@glemaitre glemaitre Jul 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And since that they are public utils, we should not break the backward compatibility.

Copy link
Member Author

@amueller amueller Jul 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the test. @glemaitre are you saying changing the text of an error message is a backward-incompatible change? I don't think we need to be that strict and I don't think we have been that strict in the past.
I have no idea how I would deprecate the content of an error message.

Copy link
Contributor

@glemaitre glemaitre Jul 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that this is in validation.py so it is not an issue if the failure is not raised within check_estimator

My reasoning was linked to something that we did in the past: #13013 (comment)

Copy link
Member Author

@amueller amueller Jul 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only had to change test_validation, not the estimator checks. That means it's not used in check_estimator, right? Otherwise those tests would be failing.

Copy link
Contributor

@glemaitre glemaitre Jul 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True

Copy link
Member Author

@amueller amueller Jul 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

merge?

Copy link
Member

@jeremiedbb jeremiedbb left a comment

lgtm.

I agree that a change in the error message should not be considered backward incompatible.

@@ -332,8 +332,7 @@ def test_check_array_on_mock_dataframe():
arr = np.array([[0.2, 0.7], [0.6, 0.5], [0.4, 0.1], [0.7, 0.2]])
mock_df = MockDataFrame(arr)
checked_arr = check_array(mock_df)
assert (checked_arr.dtype ==
arr.dtype)
assert (checked_arr.dtype == arr.dtype)
Copy link
Member

@jeremiedbb jeremiedbb Jul 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for parentheses here

Copy link
Contributor

@glemaitre glemaitre Jul 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amueller if you can also address this one

return len(x)
else:

if hasattr(x, '__len__'):
Copy link
Member

@thomasjpfan thomasjpfan Jul 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Faster and more pythonic to:

try:
    return len(x)
except TypeError:
    raise TypeError(message)

@thomasjpfan thomasjpfan merged commit aa4f313 into scikit-learn:master Jul 23, 2019
@thomasjpfan
Copy link
Member

thomasjpfan commented Jul 23, 2019

Thank you @amueller!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants
Morty Proxy This is a proxified and sanitized view of the page, visit original site.