New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nicer error in num_samples if shape is not valid and there's no __len__ #14369
Nicer error in num_samples if shape is not valid and there's no __len__ #14369
Conversation
| raise TypeError('Expected sequence or array-like, got ' | ||
| 'estimator %s' % x) | ||
| raise TypeError(message) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some tests of functions relying on _num_samples internally check that 'estimator' is in the error message.
We can either change the expected message in those tests or keep the special case when x is an estimator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And since that they are public utils, we should not break the backward compatibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the test. @glemaitre are you saying changing the text of an error message is a backward-incompatible change? I don't think we need to be that strict and I don't think we have been that strict in the past.
I have no idea how I would deprecate the content of an error message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that this is in validation.py so it is not an issue if the failure is not raised within check_estimator
My reasoning was linked to something that we did in the past: #13013 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only had to change test_validation, not the estimator checks. That means it's not used in check_estimator, right? Otherwise those tests would be failing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
merge?
| @@ -332,8 +332,7 @@ def test_check_array_on_mock_dataframe(): | ||
| arr = np.array([[0.2, 0.7], [0.6, 0.5], [0.4, 0.1], [0.7, 0.2]]) | ||
| mock_df = MockDataFrame(arr) | ||
| checked_arr = check_array(mock_df) | ||
| assert (checked_arr.dtype == | ||
| arr.dtype) | ||
| assert (checked_arr.dtype == arr.dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need for parentheses here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@amueller if you can also address this one
sklearn/utils/validation.py
Outdated
| return len(x) | ||
| else: | ||
|
|
||
| if hasattr(x, '__len__'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Faster and more pythonic to:
try:
return len(x)
except TypeError:
raise TypeError(message)|
Thank you @amueller! |

Formed in 2009, the Archive Team (not to be confused with the archive.org Archive-It Team) is a rogue archivist collective dedicated to saving copies of rapidly dying or deleted websites for the sake of history and digital heritage. The group is 100% composed of volunteers and interested parties, and has expanded into a large amount of related projects for saving online and digital history.

This refactors
_num_samplesto raise a nice error message if X has nolen.This happens apparently with tensorflow symbolic tensors, for example. But generally also seems nicer.