-
-
Notifications
You must be signed in to change notification settings - Fork 11k
ENH: Ensure lib._format_impl.read_array
handles file reading errors.
#28330
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
If we add a custom error, please also add a test. Also, if we already add a custom error message, I feel we can do better to inform the user about the why and also not drop the potentially useful information about the sizes. |
lib._format_impl.read_array
handles file reading errors.lib._format_impl.read_array
handles file reading errors.
Thanks for your feedback. |
fc3fcec
to
b7f9d50
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, might suggest some word-smithing if I have an idea, but happy to put this in with some minor cleanups.
numpy/lib/_format_impl.py
Outdated
@@ -872,6 +872,9 @@ def read_array(fp, allow_pickle=False, pickle_kwargs=None, *, | ||
data = _read_bytes(fp, read_size, "array data") | ||
array[i:i + read_count] = numpy.frombuffer(data, dtype=dtype, | ||
count=read_count) | ||
|
||
if array.size != count: | ||
raise ValueError(f"Failed to read all data for array. Expected {shape} = {count} elements, read {array.size} elements.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost think an IOError
would be nice, but let's stick with ValueError
, since that was the old one.
I like this a lot more, thanks! Thinking if we can polish it a bit, but maybe this is already it. Maybe "could only read ..."
or add something like: (file seems not fully written)
or so? Thinking about when this happens, and I guess it is usually if for some reason the file wasn't written completely (disk space/error or just forgetting to flush if reading what another program just wrote).
Mainly, the line is also too long.
@charris do you happen to know right away why this is not caught by the linter CI job?
numpy/lib/tests/test_format.py
Outdated
f.truncate() | ||
with open(path, 'rb') as f: | ||
arr2 = format.read_array(f) | ||
return arr2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please move this into the test. Also use the tmppath
fixture (i.e. a "magic" argument that pytest automatically passes if it finds it).
numpy/lib/tests/test_format.py
Outdated
def test_file_truncated(): | ||
for arr in basic_arrays: | ||
if arr.dtype != object: | ||
assert_raises(ValueError, file_truncated, arr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realize this file uses assert_raises
, but let's prefer with pytest.raises(ValueError, match=...):
.
Also, please make sure to only include the actual reading inside the with statement. (with match=
its pretty safe, but this way a ValueError
elsewhere might mean the test does nothing easily): the try/except
should be as specific as is easy.
985f8e4
to
2f54a45
Compare
Thanks @WAAutoMaton seems like a reasonable small improvemnt now, let's just put this in. |
When a user uses
numpy.load(filename)
to read a.npy
array file, and an error occurs during the reading operation (for example, if a 1024MiB file is missing the last 512MiB content), it will produce the following error:This error can be confusing to the user, as it does not indicate the actual cause of the problem.
This PR checks if the size of the array returned by numpy.fromfile in the read_array method matches the expected size. If it doesn't, it explicitly informs the user of the reason for the error: