ENH: Ensure `lib._format_impl.read_array` handles file reading errors. #28330

WAAutoMaton · Feb 13, 2025

When a user uses numpy.load(filename) to read a .npy array file, and an error occurs during the reading operation (for example, if a 1024MiB file is missing the last 512MiB content), it will produce the following error:

  File ".../numpy/lib/_format_impl.py", line 883, in read_array
    array.shape = shape
    ^^^^^^^^^^^
ValueError: cannot reshape array of size 65535984 into shape (128,1048576)

This error can be confusing to the user, as it does not indicate the actual cause of the problem.

This PR checks if the size of the array returned by numpy.fromfile in the read_array method matches the expected size. If it doesn't, it explicitly informs the user of the reason for the error:

  File ".../numpy/lib/_format_impl.py", line 877, in read_array
    raise ValueError(f"Failed to read all data for array. Expected {shape} = {count} elements, read {array.size} elements.")
ValueError: Failed to read all data for array. Expected (128, 1048576) = 134217728 elements, read 67108848 elements.

seberg · Feb 25, 2025

If we add a custom error, please also add a test. Also, if we already add a custom error message, I feel we can do better to inform the user about the why and also not drop the potentially useful information about the sizes.

WAAutoMaton · Mar 1, 2025

If we add a custom error, please also add a test. Also, if we already add a custom error message, I feel we can do better to inform the user about the why and also not drop the potentially useful information about the sizes.

Thanks for your feedback.
I have tried to write a test and hope it meets the requirements.

seberg

Thanks, might suggest some word-smithing if I have an idea, but happy to put this in with some minor cleanups.

seberg · Mar 2, 2025

numpy/lib/_format_impl.py

@@ -872,6 +872,9 @@ def read_array(fp, allow_pickle=False, pickle_kwargs=None, *,
                    data = _read_bytes(fp, read_size, "array data")
                    array[i:i + read_count] = numpy.frombuffer(data, dtype=dtype,
                                                             count=read_count)
+
+        if array.size != count:
+            raise ValueError(f"Failed to read all data for array. Expected {shape} = {count} elements, read {array.size} elements.")


Almost think an IOError would be nice, but let's stick with ValueError, since that was the old one.

I like this a lot more, thanks! Thinking if we can polish it a bit, but maybe this is already it. Maybe "could only read ..." or add something like: (file seems not fully written) or so? Thinking about when this happens, and I guess it is usually if for some reason the file wasn't written completely (disk space/error or just forgetting to flush if reading what another program just wrote).

Mainly, the line is also too long.

@charris do you happen to know right away why this is not caught by the linter CI job?

seberg · Mar 2, 2025

numpy/lib/tests/test_format.py

+            f.truncate()
+        with open(path, 'rb') as f:
+            arr2 = format.read_array(f)
+        return arr2


Please move this into the test. Also use the tmppath fixture (i.e. a "magic" argument that pytest automatically passes if it finds it).

seberg · Mar 2, 2025

numpy/lib/tests/test_format.py

+def test_file_truncated():
+    for arr in basic_arrays:
+        if arr.dtype != object:
+            assert_raises(ValueError, file_truncated, arr)


I realize this file uses assert_raises, but let's prefer with pytest.raises(ValueError, match=...):.

Also, please make sure to only include the actual reading inside the with statement. (with match= its pretty safe, but this way a ValueError elsewhere might mean the test does nothing easily): the try/except should be as specific as is easy.

…read_array``

numpy/lib/tests/test_format.py

seberg · Mar 9, 2025

Thanks @WAAutoMaton seems like a reasonable small improvemnt now, let's just put this in.

github-actions bot added the 00 - Bug label Feb 13, 2025

tylerjereddy added the component: numpy.lib label Feb 13, 2025

seberg added 01 - Enhancement and removed 00 - Bug labels Feb 25, 2025

seberg changed the title ~~BUG: Ensure lib._format_impl.read_array handles file reading errors.~~ ENH: Ensure lib._format_impl.read_array handles file reading errors. Feb 25, 2025

WAAutoMaton force-pushed the handle-file-read-error branch 2 times, most recently from fc3fcec to b7f9d50 Compare March 1, 2025 12:49

seberg reviewed Mar 2, 2025

View reviewed changes

WAAutoMaton added 5 commits March 9, 2025 11:40

BUG: Ensure lib._format_impl.read_array handles file reading errors.

4a13478

ENH: improve error messages inlib._format_impl.read_array

5c7cf8e

TST: add tests for file read failures in read_array

544f4cb

ENH: improve the phrasing of the error message in ``lib._format_impl.…

0c2df9d

…read_array``

TST: fix some issues in test_file_truncated

2f54a45

WAAutoMaton force-pushed the handle-file-read-error branch from 985f8e4 to 2f54a45 Compare March 9, 2025 03:42

seberg reviewed Mar 9, 2025

View reviewed changes

numpy/lib/tests/test_format.py Outdated Show resolved Hide resolved

Update numpy/lib/tests/test_format.py

ef7a209

seberg merged commit d8cba36 into numpy:main Mar 9, 2025
68 of 69 checks passed

github-project-automation bot moved this from Awaiting a code review to Completed in NumPy first-time contributor PRs Mar 9, 2025

rkern mentioned this pull request Jun 17, 2025

BUG: load no longer reads files with shape=(-1,...) #29217

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ENH: Ensure `lib._format_impl.read_array` handles file reading errors. #28330

ENH: Ensure `lib._format_impl.read_array` handles file reading errors. #28330

Uh oh!

WAAutoMaton commented Feb 13, 2025 •

edited

Loading

Uh oh!

seberg commented Feb 25, 2025

Uh oh!

WAAutoMaton commented Mar 1, 2025

Uh oh!

seberg left a comment

Uh oh!

seberg Mar 2, 2025

Uh oh!

seberg Mar 2, 2025

Uh oh!

seberg Mar 2, 2025

Uh oh!

Uh oh!

seberg commented Mar 9, 2025

Uh oh!

Uh oh!

Uh oh!

Search code, repositories, users, issues, pull requests...

Uh oh!

ENH: Ensure lib._format_impl.read_array handles file reading errors. #28330

ENH: Ensure lib._format_impl.read_array handles file reading errors. #28330

Uh oh!

Conversation

WAAutoMaton commented Feb 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

seberg commented Feb 25, 2025

Uh oh!

WAAutoMaton commented Mar 1, 2025

Uh oh!

seberg left a comment

Choose a reason for hiding this comment

Uh oh!

seberg Mar 2, 2025

Choose a reason for hiding this comment

Uh oh!

seberg Mar 2, 2025

Choose a reason for hiding this comment

Uh oh!

seberg Mar 2, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

seberg commented Mar 9, 2025

Uh oh!

Uh oh!

Uh oh!

ENH: Ensure `lib._format_impl.read_array` handles file reading errors. #28330

ENH: Ensure `lib._format_impl.read_array` handles file reading errors. #28330

WAAutoMaton commented Feb 13, 2025 •

edited

Loading