Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

ENH: support no-copy pickling for any array that can be transposed to a C-contiguous array #28105

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 48 commits into from
May 14, 2025
Merged
Show file tree
Hide file tree
Changes from 12 commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
0dcde49
add support for transposed contiguous array and corresponding test
IndifferentArea Jan 6, 2025
a27f5eb
Merge branch 'numpy:main' into GH-26878
IndifferentArea Jan 6, 2025
50be134
fix lint
IndifferentArea Jan 6, 2025
7cea83c
rm bad includes
IndifferentArea Jan 6, 2025
bf5ecaa
fix
IndifferentArea Jan 6, 2025
a566660
fix
IndifferentArea Jan 6, 2025
63da2b9
discard vla
IndifferentArea Jan 6, 2025
9494672
accept legacy pkl
IndifferentArea Jan 9, 2025
6b9421a
check return
IndifferentArea Jan 9, 2025
6ca72f1
fix ci
IndifferentArea Jan 9, 2025
0f70349
check after transpose
IndifferentArea Jan 9, 2025
c045bd9
discard auto include
IndifferentArea Jan 9, 2025
aa357a9
use N format in pybuildvalue
IndifferentArea Jan 10, 2025
aee895b
use pytuple pack
IndifferentArea Jan 10, 2025
1b2f9d6
clean up
IndifferentArea Jan 10, 2025
4a16cd3
add comment on fall back behavior
IndifferentArea Jan 10, 2025
449ea59
add mroe test
IndifferentArea Jan 10, 2025
f0d528a
use byte str instead of pkl to test
IndifferentArea Jan 10, 2025
a8cc7fc
get shape with pyarray api
IndifferentArea Jan 10, 2025
5f88eb3
fmt
IndifferentArea Jan 10, 2025
4ba4c55
better comment
IndifferentArea Jan 10, 2025
de8899a
fix as suggested
IndifferentArea Jan 10, 2025
4be8c0c
fix lint
IndifferentArea Jan 10, 2025
e8ddf99
discard auto include
IndifferentArea Jan 10, 2025
3ee9178
shorter tested strs
IndifferentArea Jan 13, 2025
4783c41
fmt
IndifferentArea Jan 13, 2025
d2a8306
last correct commit
IndifferentArea Jan 13, 2025
94a297a
last correct commit
IndifferentArea Jan 13, 2025
e17acbc
[skip ci] memory leak
IndifferentArea Jan 13, 2025
7fd5037
add cleanup
IndifferentArea Jan 13, 2025
16e7067
[skip ci] add release doc
IndifferentArea Jan 13, 2025
7982184
[skip ci] add check for pylong_fromlong
IndifferentArea Jan 13, 2025
d635e65
typo
IndifferentArea Jan 13, 2025
aaa9b9f
fix memory leak
IndifferentArea Jan 14, 2025
ecb863f
more clean up
IndifferentArea Jan 14, 2025
bfbde0e
use same random seed
IndifferentArea Jan 27, 2025
35ea29f
use xdecref
IndifferentArea Jan 27, 2025
596db8c
check PyArray_Transpose
IndifferentArea Jan 27, 2025
1a82f4e
check PyArray_IntTupleFromIntp
IndifferentArea Jan 27, 2025
80ba96d
fix
IndifferentArea Feb 2, 2025
792468a
once-over
IndifferentArea Feb 2, 2025
c213b37
add additional test
IndifferentArea Feb 2, 2025
3174d5e
once over again
IndifferentArea Feb 2, 2025
43e214a
fmt
IndifferentArea Mar 26, 2025
42e1c99
pickle format consistency
IndifferentArea Mar 26, 2025
f71549f
fix incorrect usage of cpy api
IndifferentArea Mar 26, 2025
d5c2215
fmt
IndifferentArea Mar 26, 2025
8b981dc
Merge branch 'main' into GH-26878
IndifferentArea Mar 26, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion 4 numpy/_core/numeric.py
Original file line number Diff line number Diff line change
Expand Up @@ -1920,7 +1920,9 @@ def fromfunction(function, shape, *, dtype=float, like=None, **kwargs):
_fromfunction_with_like = array_function_dispatch()(fromfunction)


def _frombuffer(buf, dtype, shape, order):
def _frombuffer(buf, dtype, shape, order, axis=None):
IndifferentArea marked this conversation as resolved.
Show resolved Hide resolved
if order == 'K' and axis is not None:
return frombuffer(buf, dtype=dtype).reshape(shape, order='C').transpose(axis)
return frombuffer(buf, dtype=dtype).reshape(shape, order=order)


Expand Down
55 changes: 43 additions & 12 deletions 55 numpy/_core/src/multiarray/methods.c
Original file line number Diff line number Diff line change
Expand Up @@ -1866,6 +1866,7 @@ array_reduce_ex_picklebuffer(PyArrayObject *self, int protocol)
PyObject *picklebuf_args = NULL;
PyObject *buffer = NULL, *transposed_array = NULL;
PyArray_Descr *descr = NULL;
PyObject *rev_perm = NULL; // only used in 'K' order
char order;

descr = PyArray_DESCR(self);
Expand All @@ -1882,19 +1883,46 @@ array_reduce_ex_picklebuffer(PyArrayObject *self, int protocol)
}

/* Construct a PickleBuffer of the array */

if (!PyArray_IS_C_CONTIGUOUS((PyArrayObject*) self) &&
PyArray_IS_F_CONTIGUOUS((PyArrayObject*) self)) {
if (PyArray_IS_C_CONTIGUOUS((PyArrayObject *)self)) {
order = 'C';
picklebuf_args = Py_BuildValue("(O)", self);
rev_perm = Py_BuildValue("O", Py_None);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might use PyTuple_Pack() where possible (even refactor).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get it, you mean construct an empty tuple instead of None to rev_perm on c_contiguous and f_contiguous path?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean I would avoid Py_BuildValue when it is easy to do so, and it is here. But also OK to keep: this is inherited.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied as you suggested.

}
else if (PyArray_IS_F_CONTIGUOUS((PyArrayObject *)self)) {
/* if the array if Fortran-contiguous and not C-contiguous,
* the PickleBuffer instance will hold a view on the transpose
* of the initial array, that is C-contiguous. */
order = 'F';
transposed_array = PyArray_Transpose((PyArrayObject*)self, NULL);
transposed_array = PyArray_Transpose((PyArrayObject *)self, NULL);
ngoldbaum marked this conversation as resolved.
Show resolved Hide resolved
picklebuf_args = Py_BuildValue("(N)", transposed_array);
rev_perm = Py_BuildValue("O", Py_None);
}
else {
order = 'C';
picklebuf_args = Py_BuildValue("(O)", self);
order = 'K'; // won't be used
IndifferentArea marked this conversation as resolved.
Show resolved Hide resolved
const int n = PyArray_NDIM(self);
npy_stride_sort_item items[NPY_MAXDIMS];
// sort (strde, perm) as descending = transpose to C
PyArray_CreateSortedStridePerm(n, PyArray_STRIDES(self), items);
rev_perm = PyTuple_New(n);
seberg marked this conversation as resolved.
Show resolved Hide resolved
if(rev_perm == NULL) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if(rev_perm == NULL) {
if (rev_perm == NULL) {

return NULL;
}
PyArray_Dims perm;
npy_intp d[NPY_MAXDIMS];
IndifferentArea marked this conversation as resolved.
Show resolved Hide resolved
for (int i = 0; i < n; i++) {
d[i] = items[i].perm;
PyTuple_SET_ITEM(rev_perm, items[i].perm, PyLong_FromLong(i));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit annoying since it can't actually fail in CPython. But maybe we should just add an error check on PyLong_FromLong here.

}
perm.ptr = d;
perm.len = n;
transposed_array = PyArray_Transpose((PyArrayObject *)self, &perm);
ngoldbaum marked this conversation as resolved.
Show resolved Hide resolved
if (!PyArray_IS_C_CONTIGUOUS((PyArrayObject *)transposed_array)) {
// self is non-contiguous
Py_DECREF(picklebuf_class);
Py_DECREF(rev_perm);
return array_reduce_ex_regular(self, protocol);
}
picklebuf_args = Py_BuildValue("(N)", transposed_array);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to ignore, picklebuf_args is also rather silly as a tuple. We should just use PyObject_CallOneArg(transposed_array != NULL ? transposed_array : self).
Faster, and no need to fight with picklebuf_args cleanup.

Even better: if we cache picklebuf_class using the npy_cache_import_runtime pattern we also don't need to clean that up.

Copy link
Contributor Author

@IndifferentArea IndifferentArea Jan 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar with these APIs but I tried.

npy_cache_import_runtime is fine but applying PyObject_CallOneArg in e17acbc (not sure if I used it correctly) causes numpy/_core/tests/test_multiarray.py::TestPickling::test_roundtrip failed due to memory leak. I got no idea how it is caused.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, transposed_array needs decref

}
if (picklebuf_args == NULL) {
Py_DECREF(picklebuf_class);
Expand Down Expand Up @@ -1928,10 +1956,15 @@ array_reduce_ex_picklebuffer(PyArrayObject *self, int protocol)
return NULL;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PyArray_IntTupleFromIntp can fail so you need to check if shape is NULL here.

Copy link
Contributor Author

@IndifferentArea IndifferentArea Feb 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thx, added


return Py_BuildValue("N(NONN)",
from_buffer_func, buffer, (PyObject *)descr,
PyObject_GetAttrString((PyObject *)self, "shape"),
PyUnicode_FromStringAndSize(&order, 1));

PyObject* shape = NULL;
if(order == 'K') {
IndifferentArea marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if(order == 'K') {
if (order == 'K') {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

shape = PyObject_GetAttrString((PyObject *)transposed_array, "shape");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Old messy code, so you can ignore: but we have a PyArray_IntTupleFromIntp helper to use PyArray_NDIM and PyArray_DIMS directly.

Copy link
Contributor Author

@IndifferentArea IndifferentArea Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thx, applied

} else {
shape = PyObject_GetAttrString((PyObject *)self, "shape");
}
return Py_BuildValue("N(NONNO)", from_buffer_func, buffer, (PyObject *)descr, shape,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't right, you lose rev_perm.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you mean lose

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leak? You need to decref it at some point or use N.

Copy link
Contributor Author

@IndifferentArea IndifferentArea Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it, thanks. never wrote on cpython api before, sorry for these oblivious problems.

PyUnicode_FromStringAndSize(&order, 1), rev_perm);
}

static PyObject *
Expand All @@ -1946,8 +1979,6 @@ array_reduce_ex(PyArrayObject *self, PyObject *args)

descr = PyArray_DESCR(self);
if ((protocol < 5) ||
(!PyArray_IS_C_CONTIGUOUS((PyArrayObject*)self) &&
!PyArray_IS_F_CONTIGUOUS((PyArrayObject*)self)) ||
ngoldbaum marked this conversation as resolved.
Show resolved Hide resolved
PyDataType_FLAGCHK(descr, NPY_ITEM_HASOBJECT) ||
(PyType_IsSubtype(((PyObject*)self)->ob_type, &PyArray_Type) &&
((PyObject*)self)->ob_type != &PyArray_Type) ||
Expand Down
Binary file added BIN +323 Bytes numpy/_core/tests/data/legacy_contiguous.pkl
Binary file not shown.
25 changes: 25 additions & 0 deletions 25 numpy/_core/tests/test_multiarray.py
Original file line number Diff line number Diff line change
Expand Up @@ -4389,6 +4389,31 @@ def test_f_contiguous_array(self):
buffers=buffers)

assert_equal(f_contiguous_array, depickled_f_contiguous_array)

def test_transposed_contiguous_array(self):
transposed_contiguous_array = np.random.rand(2, 3, 4).transpose((1, 0, 2))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The transpose example is not chosen carefully, maybe you should just have multiple examples with pytest.parametrize and make at least one 4-5 dimensional so this is less likely to happen:

a = np.ones((1, 2, 3))
a.transpose(1, 0, 2).transpose(1, 0, 2).shape == a.shape

so I don't know from the test if the back-transpose is filled correctly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added more test case

buffers = []

# When using pickle protocol 5, Fortran-contiguous arrays can be
# serialized using out-of-band buffers
bytes_string = pickle.dumps(transposed_contiguous_array, protocol=5,
buffer_callback=buffers.append)

assert len(buffers) > 0

depickled_transposed_contiguous_array = pickle.loads(bytes_string,
buffers=buffers)

assert_equal(transposed_contiguous_array, depickled_transposed_contiguous_array)

def test_load_legacy_pkl(self):
IndifferentArea marked this conversation as resolved.
Show resolved Hide resolved
x = np.arange(24).reshape(3, 4, 2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two nits:

  1. Might still use dtype=np.uint8 to make the first few pickles a bit smaller
  2. I don't see a reason for a dictionary? Just use non_contiguous = ... (could use a parametrization, but might just get weird/messy with the long lines, so maybe better keep it as it is)

I don't understand why the linter isn't failing. Maybe add # noqa on the long lines anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

# generated by same code in 2.2.1
data_dir = os.path.join(os.path.dirname(__file__), 'data')
legacy_pkl_path = os.path.join(data_dir, "legacy_contiguous.pkl")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this works (I suspect you could use dtype=np.int8 or so and just copy the whole bytes string into the code here also), but just if you like it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used byte str in code and added more examples

with open(legacy_pkl_path, 'rb') as f:
y = pickle.load(f)
assert_allclose(x, y)

def test_non_contiguous_array(self):
non_contiguous_array = np.arange(12).reshape(3, 4)[:, :2]
Expand Down
Loading
Morty Proxy This is a proxified and sanitized view of the page, visit original site.