Nicer error in num_samples if shape is not valid and there's no len #14369

amueller · Jul 14, 2019

This refactors _num_samples to raise a nice error message if X has no len.

This happens apparently with tensorflow symbolic tensors, for example. But generally also seems nicer.

jeremiedbb · Jul 15, 2019

sklearn/utils/validation.py

-        raise TypeError('Expected sequence or array-like, got '
-                        'estimator %s' % x)
+        raise TypeError(message)
+


Some tests of functions relying on _num_samples internally check that 'estimator' is in the error message.
We can either change the expected message in those tests or keep the special case when x is an estimator.

And since that they are public utils, we should not break the backward compatibility.

I changed the test. @glemaitre are you saying changing the text of an error message is a backward-incompatible change? I don't think we need to be that strict and I don't think we have been that strict in the past.
I have no idea how I would deprecate the content of an error message.

I see that this is in validation.py so it is not an issue if the failure is not raised within check_estimator

My reasoning was linked to something that we did in the past: #13013 (comment)

I only had to change test_validation, not the estimator checks. That means it's not used in check_estimator, right? Otherwise those tests would be failing.

jeremiedbb

lgtm.

I agree that a change in the error message should not be considered backward incompatible.

jeremiedbb · Jul 16, 2019

sklearn/utils/tests/test_validation.py

@@ -332,8 +332,7 @@ def test_check_array_on_mock_dataframe():
    arr = np.array([[0.2, 0.7], [0.6, 0.5], [0.4, 0.1], [0.7, 0.2]])
    mock_df = MockDataFrame(arr)
    checked_arr = check_array(mock_df)
-    assert (checked_arr.dtype ==
-                 arr.dtype)
+    assert (checked_arr.dtype == arr.dtype)


no need for parentheses here

@amueller if you can also address this one

thomasjpfan · Jul 18, 2019

sklearn/utils/validation.py

-            return len(x)
-    else:
+
+    if hasattr(x, '__len__'):


Faster and more pythonic to:

try: return len(x) except TypeError: raise TypeError(message)

thomasjpfan · Jul 23, 2019

Thank you @amueller!

amueller added 4 commits Jul 14, 2019

add test for weird types

1b5b9f5

nicer error message if there's no len

dd83ba1

refactor error message slightly

699bbc8

use type

f563113

robieta mentioned this pull request Jul 15, 2019

pass sample weight into py_func tensorflow/tensorflow#28619

Closed

jeremiedbb reviewed Jul 15, 2019

View changes

amueller added 3 commits Jul 16, 2019

fix regex in test

ec8262e

pep8

3aeb119

pep8 some more

716f114

jeremiedbb approved these changes Jul 16, 2019

View changes

ruidazeng approved these changes Jul 17, 2019

View changes

thomasjpfan reviewed Jul 18, 2019

View changes

address comments

d4900d9

thomasjpfan approved these changes Jul 23, 2019

View changes

thomasjpfan merged commit aa4f313 into scikit-learn:master Jul 23, 2019

Oct	NOV	Dec
	27
2021	2022	2023

Nicer error in num_samples if shape is not valid and there's no len #14369

Nicer error in num_samples if shape is not valid and there's no len #14369

amueller commented Jul 14, 2019

jeremiedbb Jul 15, 2019

glemaitre Jul 16, 2019

amueller Jul 16, 2019

glemaitre Jul 16, 2019

amueller Jul 16, 2019

glemaitre Jul 16, 2019

amueller Jul 17, 2019

jeremiedbb left a comment

jeremiedbb Jul 16, 2019

glemaitre Jul 18, 2019

thomasjpfan Jul 18, 2019

thomasjpfan commented Jul 23, 2019

Nicer error in num_samples if shape is not valid and there's no __len__ #14369

Nicer error in num_samples if shape is not valid and there's no __len__ #14369

Conversation

amueller commented Jul 14, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jeremiedbb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thomasjpfan commented Jul 23, 2019

Nicer error in num_samples if shape is not valid and there's no len #14369

Nicer error in num_samples if shape is not valid and there's no len #14369