Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

ENH: support no-copy pickling for any array that can be transposed to a C-contiguous array #28105

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 48 commits into from
May 14, 2025

Conversation

IndifferentArea
Copy link
Contributor

As discussed in #26878, support no-copy pickling for any array that can be transposed to a C-contiguous array.

@IndifferentArea IndifferentArea changed the title ENH: support no-copy pickling for any array that can be transposed to a C-contiguous array WIP: ENH: support no-copy pickling for any array that can be transposed to a C-contiguous array Jan 6, 2025
@IndifferentArea IndifferentArea marked this pull request as draft January 6, 2025 05:54
@IndifferentArea
Copy link
Contributor Author

Hi, I’m new to contributing to NumPy and would appreciate some guidance on the following:

  • I’ve added a simple test for this feature. Are there additional test cases or scenarios I should consider?
  • Before requesting a review, should I document the changes in doc/release/upcoming_changes/ or elsewhere?

Thank you!

@IndifferentArea IndifferentArea changed the title WIP: ENH: support no-copy pickling for any array that can be transposed to a C-contiguous array ENH: support no-copy pickling for any array that can be transposed to a C-contiguous array Jan 7, 2025
@IndifferentArea IndifferentArea marked this pull request as ready for review January 7, 2025 02:15
@ngoldbaum
Copy link
Member

Before requesting a review, should I document the changes in doc/release/upcoming_changes/ or elsewhere?

Yes, see step 5 https://numpy.org/devdocs/dev/index.html#development-process-summary.

I'm also not totally sure what the policy is on changing the array pickle format, since people use pickles for uses they are not well-suited but we still need to support those uses anyway...

Copy link
Member

@seberg seberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments, we should discuss it but I think we can change it so long we make sure that the new NumPy can read the old pickles (and that also means adding a test that would currently fail, I think).

Also you need a test for the non-contiguous protocol=5 (unless there already is).

@@ -1920,8 +1920,10 @@ def fromfunction(function, shape, *, dtype=float, like=None, **kwargs):
_fromfunction_with_like = array_function_dispatch()(fromfunction)


def _frombuffer(buf, dtype, shape, order):
return frombuffer(buf, dtype=dtype).reshape(shape, order=order)
def _frombuffer(buf, dtype, shape, order, axis):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function must still be able to deal with old pickles. It is effectively public API.

A test should have existed to find this. We should use pickle.dumps(...) on a current NumPy version, take that string and ensure that pickle.loads(...) on it still works (i.e. freeze the pickle in time).

This looks like it breaks existing pickles and we shouldn't.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've given axis a default value None and added a test on a legacy pkl dumped in 2.2.1.

Should I keep this test in code? Or should I move it in test_regression.py?

if (PyArray_IS_C_CONTIGUOUS((PyArrayObject *)self)) {
order = 'C';
picklebuf_args = Py_BuildValue("(O)", self);
rev_perm = Py_BuildValue("O", Py_None);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might use PyTuple_Pack() where possible (even refactor).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get it, you mean construct an empty tuple instead of None to rev_perm on c_contiguous and f_contiguous path?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean I would avoid Py_BuildValue when it is easy to do so, and it is here. But also OK to keep: this is inherited.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied as you suggested.

for (int i = n - 1, check_s = 1; i >= 0; i--) {
if (check_s * PyArray_DESCR(self)->elsize !=
items[i].stride) {
return array_reduce_ex_regular(self, protocol);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we assume it works usually and/or the other path is slow anyway, could also just not check this and see if the transpose works out?
Not sure that is better, though.

Copy link
Contributor Author

@IndifferentArea IndifferentArea Jan 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah...Applied, this is much more straightforward!

Not sure if I Py_DECREF correctly. :(

numpy/_core/src/multiarray/methods.c Show resolved Hide resolved
} else {
shape = PyObject_GetAttrString((PyObject *)self, "shape");
}
return Py_BuildValue("N(NONNO)", from_buffer_func, buffer, (PyObject *)descr, shape,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't right, you lose rev_perm.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you mean lose

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leak? You need to decref it at some point or use N.

Copy link
Contributor Author

@IndifferentArea IndifferentArea Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it, thanks. never wrote on cpython api before, sorry for these oblivious problems.

@IndifferentArea
Copy link
Contributor Author

IndifferentArea commented Jan 9, 2025

Also you need a test for the non-contiguous protocol=5 (unless there already is).

There is already a test for that

def test_non_contiguous_array(self):
non_contiguous_array = np.arange(12).reshape(3, 4)[:, :2]
assert not non_contiguous_array.flags.c_contiguous
assert not non_contiguous_array.flags.f_contiguous
# make sure non-contiguous arrays can be pickled-depickled
# using any protocol
for proto in range(2, pickle.HIGHEST_PROTOCOL + 1):
depickled_non_contiguous_array = pickle.loads(
pickle.dumps(non_contiguous_array, protocol=proto))
assert_equal(non_contiguous_array, depickled_non_contiguous_array)

Copy link
Member

@seberg seberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, the new test looks good. A few more comment, and I have one more comment actually: I wonder if we should just pickle C/F contiguous ones without passing the axes order so that they will still unpickle on older versions (because it is easy, not because it is required).

} else {
shape = PyObject_GetAttrString((PyObject *)self, "shape");
}
return Py_BuildValue("N(NONNO)", from_buffer_func, buffer, (PyObject *)descr, shape,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leak? You need to decref it at some point or use N.

x = np.arange(24).reshape(3, 4, 2)
# generated by same code in 2.2.1
data_dir = os.path.join(os.path.dirname(__file__), 'data')
legacy_pkl_path = os.path.join(data_dir, "legacy_contiguous.pkl")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this works (I suspect you could use dtype=np.int8 or so and just copy the whole bytes string into the code here also), but just if you like it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used byte str in code and added more examples

if (PyArray_IS_C_CONTIGUOUS((PyArrayObject *)self)) {
order = 'C';
picklebuf_args = Py_BuildValue("(O)", self);
rev_perm = Py_BuildValue("O", Py_None);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean I would avoid Py_BuildValue when it is easy to do so, and it is here. But also OK to keep: this is inherited.

numpy/_core/src/multiarray/methods.c Show resolved Hide resolved
@@ -4389,6 +4389,31 @@ def test_f_contiguous_array(self):
buffers=buffers)

assert_equal(f_contiguous_array, depickled_f_contiguous_array)

def test_transposed_contiguous_array(self):
transposed_contiguous_array = np.random.rand(2, 3, 4).transpose((1, 0, 2))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The transpose example is not chosen carefully, maybe you should just have multiple examples with pytest.parametrize and make at least one 4-5 dimensional so this is less likely to happen:

a = np.ones((1, 2, 3))
a.transpose(1, 0, 2).transpose(1, 0, 2).shape == a.shape

so I don't know from the test if the back-transpose is filled correctly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added more test case


PyObject* shape = NULL;
if(order == 'K') {
shape = PyObject_GetAttrString((PyObject *)transposed_array, "shape");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Old messy code, so you can ignore: but we have a PyArray_IntTupleFromIntp helper to use PyArray_NDIM and PyArray_DIMS directly.

Copy link
Contributor Author

@IndifferentArea IndifferentArea Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thx, applied

numpy/_core/src/multiarray/methods.c Outdated Show resolved Hide resolved
numpy/_core/src/multiarray/methods.c Outdated Show resolved Hide resolved
numpy/_core/numeric.py Outdated Show resolved Hide resolved
numpy/_core/tests/test_multiarray.py Outdated Show resolved Hide resolved
@IndifferentArea
Copy link
Contributor Author

Apologies for the interruption. I‘m new to the numpy community, and this PR was progressing well, but it had been stalled for a week. So I'm concerned if there might be any issues that need addressing. I truly appreciate your attention and apologize for any inconvenience caused.

Copy link
Member

@ngoldbaum ngoldbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spotted some issues. Please give this entire PR a careful once-over to make sure you're catching error cases and appropriately propagating errors when they happen. If you're feeling ambitious you could try writing test cases for the error cases, error cases tend to be poorly tested and buggy.

@pytest.mark.parametrize('transposed_contiguous_array',
[np.random.rand(2, 3, 4).transpose((1, 0, 2)),
np.random.rand(2, 3, 4, 5).transpose((1, 3, 0, 2))] +
[np.random.rand(*np.arange(2, 7)).transpose(np.random.permutation(5)) for _ in range(3)])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use a seeded RNG local to the test instead of the global np.random RNG

This is still using the global np.random RNG. If you want the test to be reproducible, you need to use a seeded RNG local to the test, e.g. using this function.

Comment on lines 1947 to 1949
if (transposed_array != NULL) {
Py_DECREF(transposed_array);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (transposed_array != NULL) {
Py_DECREF(transposed_array);
}
Py_XDECREF(transposed_array);

This pattern is common enough it has its own macro in the C API. There are some other common operations that have an X variant that does NULL checking.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thx, applied

else {
shape = PyArray_IntTupleFromIntp(PyArray_NDIM(self),
PyArray_SHAPE(self));
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PyArray_IntTupleFromIntp can fail so you need to check if shape is NULL here.

Copy link
Contributor Author

@IndifferentArea IndifferentArea Feb 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thx, added

Comment on lines 1935 to 1937
if (transposed_array != NULL) {
Py_DECREF(transposed_array);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (transposed_array != NULL) {
Py_DECREF(transposed_array);
}
Py_XDECREF(transposed_array);

picklebuf_class = PyObject_GetAttrString(pickle_module, "PickleBuffer");
Py_DECREF(pickle_module);
if (picklebuf_class == NULL) {
if (npy_cache_import_runtime("pickle", "PickleBuffer", &picklebuf_class)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (npy_cache_import_runtime("pickle", "PickleBuffer", &picklebuf_class)) {
if (npy_cache_import_runtime("pickle", "PickleBuffer", &picklebuf_class) == -1) {

A little surprised the pickle tests pass given this bug...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. sorry i forgot checking the return value of npy_cache_import_runtime properly.

Since npy_cache_import_runtime will only return -1 or 0, I think it didn't actually cause a bug in the improper version so pytest didn't detect this.

numpy/_core/src/multiarray/methods.c Show resolved Hide resolved
numpy/_core/src/multiarray/methods.c Show resolved Hide resolved
@IndifferentArea
Copy link
Contributor Author

ping

@ngoldbaum
Copy link
Member

Thanks for the ping and sorry for dropping this, I'm going to take another look at this.

Copy link
Member

@ngoldbaum ngoldbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, sorry for taking so long to get back to this.

This is looking much better than the last time I reviewed, thanks for the careful updates.

I'm approving but I want another experienced developer to look this over, since changing the pickle format has annoying backward compatibility concerns associated with making any mistakes.

@seberg or maybe @mhvk would either of you be interested in giving this another pass?

numpy/_core/src/multiarray/methods.c Outdated Show resolved Hide resolved
numpy/_core/src/multiarray/methods.c Outdated Show resolved Hide resolved
numpy/_core/src/multiarray/methods.c Outdated Show resolved Hide resolved
@ngoldbaum ngoldbaum added this to the 2.3.0 release milestone Mar 25, 2025
Copy link
Member

@seberg seberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for forgetting about this (it was great that you made many of the other smaller cleanups here!).

I think it's good, I have one substantial ask, unfortunately. Because I really think we should try to allow old NumPy versions to load most pickles and I don't think this does.

numpy/_core/src/multiarray/methods.c Outdated Show resolved Hide resolved
numpy/_core/src/multiarray/methods.c Outdated Show resolved Hide resolved
numpy/_core/src/multiarray/methods.c Outdated Show resolved Hide resolved
@IndifferentArea
Copy link
Contributor Author

seems something's wrong with freebsd's ci

@seberg
Copy link
Member

seberg commented Mar 26, 2025

@IndifferentArea don't worry about it. But if you would like to see it green, try merging/rebasing the main branch. That would likely fix it.

@charris charris merged commit d32cf93 into numpy:main May 14, 2025
72 checks passed
@github-project-automation github-project-automation bot moved this from Awaiting a code review to Completed in NumPy first-time contributor PRs May 14, 2025
@charris
Copy link
Member

charris commented May 14, 2025

Thanks @IndifferentArea

@eendebakpt
Copy link
Contributor

@charris @seberg This PR introduced some linting errors on my local system. For example in numpy/_core/tests/test_multiarray.py line 4452 a trailing whitespace. The CI was green for the PR, in particular the lint test: https://github.com/numpy/numpy/actions/runs/14085614702/job/39448799495.

Not sure what happened here. The ruff version used in the CI is a bit older, but on my system it gives the same errors. I will create a PR to update ruff and address the trailing whitespace.

@ngoldbaum
Copy link
Member

It probably needed a rebase before merging.

@IndifferentArea
Copy link
Contributor Author

IndifferentArea commented May 15, 2025

Awesome to see it merged!
been busy with school things recently :(
Thanks so much for your help with this pr, and I'm excited to contribute more in the future!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging this pull request may close these issues.

5 participants
Morty Proxy This is a proxified and sanitized view of the page, visit original site.