Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

ENH: Ensure lib._format_impl.read_array handles file reading errors. #28330

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Mar 9, 2025

Conversation

WAAutoMaton
Copy link
Contributor

@WAAutoMaton WAAutoMaton commented Feb 13, 2025

When a user uses numpy.load(filename) to read a .npy array file, and an error occurs during the reading operation (for example, if a 1024MiB file is missing the last 512MiB content), it will produce the following error:

  File ".../numpy/lib/_format_impl.py", line 883, in read_array
    array.shape = shape
    ^^^^^^^^^^^
ValueError: cannot reshape array of size 65535984 into shape (128,1048576)

This error can be confusing to the user, as it does not indicate the actual cause of the problem.

This PR checks if the size of the array returned by numpy.fromfile in the read_array method matches the expected size. If it doesn't, it explicitly informs the user of the reason for the error:

  File ".../numpy/lib/_format_impl.py", line 877, in read_array
    raise ValueError(f"Failed to read all data for array. Expected {shape} = {count} elements, read {array.size} elements.")
ValueError: Failed to read all data for array. Expected (128, 1048576) = 134217728 elements, read 67108848 elements.

@seberg
Copy link
Member

seberg commented Feb 25, 2025

If we add a custom error, please also add a test. Also, if we already add a custom error message, I feel we can do better to inform the user about the why and also not drop the potentially useful information about the sizes.

@seberg seberg changed the title BUG: Ensure lib._format_impl.read_array handles file reading errors. ENH: Ensure lib._format_impl.read_array handles file reading errors. Feb 25, 2025
@WAAutoMaton
Copy link
Contributor Author

If we add a custom error, please also add a test. Also, if we already add a custom error message, I feel we can do better to inform the user about the why and also not drop the potentially useful information about the sizes.

Thanks for your feedback.
I have tried to write a test and hope it meets the requirements.

@WAAutoMaton WAAutoMaton force-pushed the handle-file-read-error branch 2 times, most recently from fc3fcec to b7f9d50 Compare March 1, 2025 12:49
Copy link
Member

@seberg seberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, might suggest some word-smithing if I have an idea, but happy to put this in with some minor cleanups.

@@ -872,6 +872,9 @@ def read_array(fp, allow_pickle=False, pickle_kwargs=None, *,
data = _read_bytes(fp, read_size, "array data")
array[i:i + read_count] = numpy.frombuffer(data, dtype=dtype,
count=read_count)

if array.size != count:
raise ValueError(f"Failed to read all data for array. Expected {shape} = {count} elements, read {array.size} elements.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost think an IOError would be nice, but let's stick with ValueError, since that was the old one.

I like this a lot more, thanks! Thinking if we can polish it a bit, but maybe this is already it. Maybe "could only read ..." or add something like: (file seems not fully written) or so? Thinking about when this happens, and I guess it is usually if for some reason the file wasn't written completely (disk space/error or just forgetting to flush if reading what another program just wrote).


Mainly, the line is also too long.

@charris do you happen to know right away why this is not caught by the linter CI job?

f.truncate()
with open(path, 'rb') as f:
arr2 = format.read_array(f)
return arr2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move this into the test. Also use the tmppath fixture (i.e. a "magic" argument that pytest automatically passes if it finds it).

def test_file_truncated():
for arr in basic_arrays:
if arr.dtype != object:
assert_raises(ValueError, file_truncated, arr)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize this file uses assert_raises, but let's prefer with pytest.raises(ValueError, match=...):.

Also, please make sure to only include the actual reading inside the with statement. (with match= its pretty safe, but this way a ValueError elsewhere might mean the test does nothing easily): the try/except should be as specific as is easy.

@WAAutoMaton WAAutoMaton force-pushed the handle-file-read-error branch from 985f8e4 to 2f54a45 Compare March 9, 2025 03:42
numpy/lib/tests/test_format.py Outdated Show resolved Hide resolved
@seberg
Copy link
Member

seberg commented Mar 9, 2025

Thanks @WAAutoMaton seems like a reasonable small improvemnt now, let's just put this in.

@seberg seberg merged commit d8cba36 into numpy:main Mar 9, 2025
68 of 69 checks passed
@github-project-automation github-project-automation bot moved this from Awaiting a code review to Completed in NumPy first-time contributor PRs Mar 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging this pull request may close these issues.

3 participants
Morty Proxy This is a proxified and sanitized view of the page, visit original site.