-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
BUG: Allow empty memmaps in most situations #27723
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
10 commits
Select commit
Hold shift + click to select a range
ced32b7
BUG: error empty memmap offset is a multiple of allocation granularity
0a141cd
DOC: added a code comment explaining issue with `bytes==0` in memmap
ef2fefb
TST: test for arr.offset being correct
698ce79
ENH: allow empty memmap
2c803f8
TST: adjust test for empty memmap
588bed6
STY: numpy/_core/tests/test_memmap.py
laarohi 2684626
STY: Update numpy/_core/tests/test_memmap.py
laarohi 9acace7
STY: Update numpy/_core/tests/test_memmap.py
laarohi 99b7afa
TST: Update numpy/_core/tests/test_memmap.py
laarohi a9f30c7
DOC: added enhancement release note for gh-27723
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
* Improved support for empty `memmap`. Previously an empty `memmap` would fail | ||
unless a non-zero ``offset`` was set. Now a zero-size `memmap` is supported | ||
even if ``offset=0``. To achieve this, if a `memmap` is mapped to an empty | ||
file that file is padded with a single byte. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs more explanation, I think. To me it also looks like this will simply fail if you pass
offset=0
. Even if I am missing something, that means it needs testing.I.e. it looks a bit like it fixes your specific issue, but may not improve things generally.
It seems like a 0 size memmap is quite fraught to begin with, so I am not sure what is right. Setting the
bytes
to be at least 1 may make sense, but on linux may also fail if the size is actually 0 it seems (so at the very least it would also lead to a regressions in some places).Of course sure, if an offset is used, that offset can be adjusted so that we don't need to "read" a byte beyond the end.
EDIT: Ack, sorry I guess the
start > 0
prevents things, but it still would be nice to be clear that this only works with offsets. And it would be good to think about what to do about the other case.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, I could have been more clear.
So just to clarify, on Linux (haven't tested on Windows), in the current numpy implementation:
offset > 0 and offset % mmap.ALLOCATIONGRANULARITY != 0
offset = 0
offset > 0 and offset % mmap.ALLOCATIONGRANULARITY == 0
With the suggested fix:
offset > 0
offset = 0
I agree that a 0 size memmap is quite fraught, but it would be nice if the behaviour of a 0 size memmap was independent of whether the offset was a multiple of the allocation granularity or not.
An alternative solution would be to not allow 0 size memmaps at all.
I do see some logic as to why it makes sense to allow a 0 size memmap with
offset > 0
as this offset typically implies that the file contains some sort of metadata besides for the 0 size array. For a 0 size memmap withoffset = 0
the file would be completely empty and thus there is nothing to map at all (mmap.mmap
withlength=0
even fails on Windows for an empty file (see here)).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeay, thanks. I have two asks:
arr.offset
being correct, even though you are messing around with the start. (I suspect it already is.)bytes=0
being problematic but we can avoid it easily ifoffset!=0
. It can reference to this PRgh-27723
.But, maybe we can rethink the
bytes=0
part as well? It seems to me that it is impossible to memory map an empty file on all systems. In other words, mapping 0 bytes works on linux probably, but only because it maps the full file and then reads nothing.I think that means that we can:
bytes=1
ifoffset=0
offset > 0
in order to not grow the file unnecessarily. Although, I don't think it would matter too much, that seems pretty safe to me.Or am I missing some path where
if bytes == 0 and offset == 0: bytes = 1
might backfire?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a test for
arr.offset == offset
, please confirm that this is what you had in mind as the test seems a bit trivial to me (but it could very well be that I misunderstood what you were asking).I've also added a code comment as asked.
I also had a think about the 0 size
memmap
withoffset=0
and your suggestion to usebytes=1
makes sense. I've implemented it in such a way that it requiresmode in ('w+', 'r+')
since we will be writing a byte to the file. There was an existing test for an emptymemmap
which I have adjusted accordingly.Have a look at the changes and let me know what you think.