Description
Feature or enhancement
Proposal:
Introduction
Memory buffers used by native libraries often come with alignment requirements, such as page alignment. If an API dealing with such buffers is exposed into Python user code, the Python side is now responsible for providing a suitably aligned memory buffer. However, there is no way of properly specifying the alignment of any object supporting the buffer protocol, neither high-level ones like bytes
and bytearray
, nor lower-level ones like ctypes
arrays and mmap
.
Over 10 years ago, a StackOverflow question has been asked for this exact problem, and one answer provides the most compact known workaround available to many Python versions. An additional problem it needs to consider here is that accessing the memory address of a Python buffer is nontrivial.
size = getBufferSizeFromSomewhere()
alignment = getBufferAlignmentFromSomewhere()
import ctypes
# Maximum amount of extra memory required - we can't know exactly before allocating!
requiredOversize = size + alignment - 1
oversizeBuffer = bytearray(requiredOversize)
# Get the address of the bytearray's backing buffer
oversizeCType = ctypes.c_char * requiredOversize
oversizeMemory = oversizeCType.from_buffer(oversizeBuffer)
oversizeAddress = ctypes.addressof(oversizeMemory)
# Calculate required offset into oversized buffer to reach proper alignment
offsetToAligned = alignment - oversizeAddress % alignment
# Create a raw C buffer with offset to not copy the byte array in the process
bufferCType = ctypes.c_char * (requiredOversize - offsetToAligned)
correctRawCMemory = bufferCType.from_buffer(oversizeMemory, offsetToAligned)
# Now use correctRawCMemory...
The C API gained PyMem_AlignedAlloc()
in 3.7 and numpy also had a feature request accepted for aligned arrays.
Proposal
Note that I would like to find a better place for an aligned memory API, I just couldn't find one so far. See "Alternatives" below.
On all platforms supported by the mmap
module, add a new parameter align
to the mmap.mmap
function.
# Windows
mmap.mmap(fileno, length, tagname=None, access=ACCESS_DEFAULT, align=1[, offset])
# Unix
mmap.mmap(fileno, length, flags=MAP_SHARED, prot=PROT_WRITE|PROT_READ, access=ACCESS_DEFAULT, align=1[, offset])
If the mapping is a named file mapping (positive file descriptor number), the file's contents are mapped into memory starting from an address that is a multiple of align
. If the mapping is an anonymous mapping (file descriptor -1), memory is allocated at an address that is a multiple of align
. The default of 1 guarantees backwards compatibility, since it allows mapping to any address as before. The offset
parameter only affects the start offset into the source data and has no effect on the starting address of the mapping.
Alternatives
Add memory alignment control to ctypes instead of mmap
The only place where I can see this would fit is ctypes._CData.from_buffer_copy
. Since ctypes does not provide a direct allocation API, the buffer needs to be allocated twice: once by the buffer provider (e.g. bytearray), and once for copying by ctypes. It also seems like a less convenient and obvious API than the other proposals
Add memory alignment control to bytearray
It is debatable whether adding such a low-level control feature to a high-level API like bytearray is a good design. However, this would make the feature available on Emscripten and WASI.
Has this already been discussed elsewhere?
This is a minor feature, which does not need previous discussion elsewhere
Links to previous discussion of this feature:
No response