-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
BUG: fancy indexing copy #26558
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: fancy indexing copy #26558
Conversation
@@ -1460,6 +1460,9 @@ mapiter_trivial_@name@( | ||
char *base_ptr, *ind_ptr, *result_ptr; | ||
npy_intp self_stride, ind_stride, result_stride; | ||
npy_intp fancy_dim = PyArray_DIM(self, 0); | ||
if (PyArray_DATA(self) == PyArray_DATA(result)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should use the may share memory functionality here. Also would be good to do this at the call site (mapping.c), I think and make sure to decref the array again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't the "may share memory" stuff carry the risk of a more substantial slow down? I think that's what I was asking about in the issue, at some point there's a tradeoff I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, may share memory does check memory bounds first and gives up at some point if things get complicated.
I.e. the simple cases are fast, some slow cases may end up with a defensive copy.
E.g. ufuncs end up using solve_may_share_memory
with max_work=1
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I revised the code to use solve_may_share_memory
one level up in the control flow, in mapping.c
.
make sure to decref the array again
I couldn't replicate the need for that, which array am I increasing the refcount for? Don't we proceed through a deep copy here?
I think this only covers a very well-behaved/specialized fancy indexing codepath based on the code comments, though that's also true of the regression test, 1-D contiguous integer data, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I think it nicer, although I also think it should be even further up maybe.
Some notes:
- We ignore overlap in the
indices
here. We actually have a pathindex_has_memory_overlap
which is used indirectly by the code fromufunc.at
and does all the checks necessary including for index arrays. - I see no reason why you cannot move the check to before the fast-path
if
? In facttmp_arr
gets prepared just above, which seems like a perfect opportunity to do that check.
(I suppose in theory, overlap is possible even if a fresh array is created withFromAny
, so could do it after theif
rather than before.)
One thing that might be nice, is that I am pretty sure we already guard for the other copy paths against this, e.g.:
arr = np.arange(100)
arr[10:] = arr[:-10]
would be nice to check we have tests and add one together with the others if not (so we can proof we always make a defensive copy in indexing paths).
f766cdd
to
4a8a367
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I think we should move it up even further and since it seems simple (the utility exists) also guard against index overlap.
numpy/_core/src/multiarray/mapping.c
Outdated
@@ -1999,6 +1999,10 @@ array_assign_subscript(PyArrayObject *self, PyObject *ind, PyObject *op) | ||
goto fail; | ||
} | ||
|
||
if (solve_may_share_memory(self, tmp_arr, 1) == 1) { | ||
tmp_arr = (PyArrayObject *)PyArray_NewCopy(tmp_arr, NPY_ANYORDER); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tmp_arr = (PyArrayObject *)PyArray_NewCopy(tmp_arr, NPY_ANYORDER); | |
Py_SETREF(tmp_arr, PyArray_NewCopy(tmp_arr, NPY_ANYORDER)); |
You would leak the old tmp_arr
@@ -1460,6 +1460,9 @@ mapiter_trivial_@name@( | ||
char *base_ptr, *ind_ptr, *result_ptr; | ||
npy_intp self_stride, ind_stride, result_stride; | ||
npy_intp fancy_dim = PyArray_DIM(self, 0); | ||
if (PyArray_DATA(self) == PyArray_DATA(result)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I think it nicer, although I also think it should be even further up maybe.
Some notes:
- We ignore overlap in the
indices
here. We actually have a pathindex_has_memory_overlap
which is used indirectly by the code fromufunc.at
and does all the checks necessary including for index arrays. - I see no reason why you cannot move the check to before the fast-path
if
? In facttmp_arr
gets prepared just above, which seems like a perfect opportunity to do that check.
(I suppose in theory, overlap is possible even if a fresh array is created withFromAny
, so could do it after theif
rather than before.)
One thing that might be nice, is that I am pretty sure we already guard for the other copy paths against this, e.g.:
arr = np.arange(100)
arr[10:] = arr[:-10]
would be nice to check we have tests and add one together with the others if not (so we can proof we always make a defensive copy in indexing paths).
* Fixes numpygh-26542 * For these very simple advanced indexing cases, if the `result` and `self` arrays share the same data pointers, use a copy of `result` to assign values to `self`, otherwise the outcome of the fancy indexing suffers from mutation of `result` before the assignments are complete.
* Avoid leaking the old `tmp_arr`, based on reviewer feedback. * Add a test for a similar 2D case that fails, then hoist `solve_may_share_memory()` check farther up in the control flow such that the test passes. * Add a reviewer-requested test for index overlap.
4a8a367
to
bf5042d
Compare
@seberg I added a test for a more "general" case (2d version of same problem basically) to justify hoisting the memory overlap check farther up in the control flow as you requested (it did indeed fail before/pass after). Hoisting did require an additional null check. I also added the test case you mentioned, which seems to pass just fine. I'm not sure I fully grasped your comment on overlapping indices, if that's more sophisticated than the test case you cited. Perhaps you can provide/describe a sample test case I need to think about if I'm still missing something. Your additional test does have overlapping indices IIUC, but presumably I won't get off the hook that easily. |
numpy/_core/src/multiarray/mapping.c
Outdated
@@ -1960,6 +1960,10 @@ array_assign_subscript(PyArrayObject *self, PyObject *ind, PyObject *op) | ||
tmp_arr = (PyArrayObject *)op; | ||
} | ||
|
||
if (tmp_arr && solve_may_share_memory(self, tmp_arr, 1) == 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (tmp_arr && solve_may_share_memory(self, tmp_arr, 1) == 1) { | |
if (tmp_arr && solve_may_share_memory(self, tmp_arr, 1) != 0) { |
Since < 0
means "failed to solve", which would mean we should do a defensive copy.
We will be missing if the index has overlap, such as something awkward like:
a = np.arange(10)
a[2:][a[:-2]] = 3
but let's just accept that here and make a new issue for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, with the suggestion applied, if you are happy @tylerjereddy, thanks.
* The usage of `solve_may_share_memory` in the above PR wasn't quite right since it ignored the case of failing to solve the overlap problem. This has been revised according to reviewer feedback. Co-authored-by: Sebastian Berg <sebastianb@nvidia.com>
@seberg Ok, I revised accordingly and added you as co-author. The cross-listed issue explains the additional case that needs looking at in a follow-up. |
Thanks for being so thorough, could probably just have done it myself... let's get it in :). |
* BUG: fancy indexing copy * Fixes numpygh-26542 * For these very simple advanced indexing cases, if the `result` and `self` arrays share the same data pointers, use a copy of `result` to assign values to `self`, otherwise the outcome of the fancy indexing suffers from mutation of `result` before the assignments are complete. * MAINT, BUG: PR 26558 revisions * Avoid leaking the old `tmp_arr`, based on reviewer feedback. * Add a test for a similar 2D case that fails, then hoist `solve_may_share_memory()` check farther up in the control flow such that the test passes. * Add a reviewer-requested test for index overlap. * BUG: PR 26558 revisions * The usage of `solve_may_share_memory` in the above PR wasn't quite right since it ignored the case of failing to solve the overlap problem. This has been revised according to reviewer feedback. Co-authored-by: Sebastian Berg <sebastianb@nvidia.com> --------- Co-authored-by: Sebastian Berg <sebastianb@nvidia.com>
* BUG: fancy indexing copy * Fixes numpygh-26542 * For these very simple advanced indexing cases, if the `result` and `self` arrays share the same data pointers, use a copy of `result` to assign values to `self`, otherwise the outcome of the fancy indexing suffers from mutation of `result` before the assignments are complete. * MAINT, BUG: PR 26558 revisions * Avoid leaking the old `tmp_arr`, based on reviewer feedback. * Add a test for a similar 2D case that fails, then hoist `solve_may_share_memory()` check farther up in the control flow such that the test passes. * Add a reviewer-requested test for index overlap. * BUG: PR 26558 revisions * The usage of `solve_may_share_memory` in the above PR wasn't quite right since it ignored the case of failing to solve the overlap problem. This has been revised according to reviewer feedback. Co-authored-by: Sebastian Berg <sebastianb@nvidia.com> --------- Co-authored-by: Sebastian Berg <sebastianb@nvidia.com>
* BUG: fancy indexing copy * Fixes numpygh-26542 * For these very simple advanced indexing cases, if the `result` and `self` arrays share the same data pointers, use a copy of `result` to assign values to `self`, otherwise the outcome of the fancy indexing suffers from mutation of `result` before the assignments are complete. * MAINT, BUG: PR 26558 revisions * Avoid leaking the old `tmp_arr`, based on reviewer feedback. * Add a test for a similar 2D case that fails, then hoist `solve_may_share_memory()` check farther up in the control flow such that the test passes. * Add a reviewer-requested test for index overlap. * BUG: PR 26558 revisions * The usage of `solve_may_share_memory` in the above PR wasn't quite right since it ignored the case of failing to solve the overlap problem. This has been revised according to reviewer feedback. Co-authored-by: Sebastian Berg <sebastianb@nvidia.com> --------- Co-authored-by: Sebastian Berg <sebastianb@nvidia.com>
* BUG: fancy indexing copy * Fixes numpygh-26542 * For these very simple advanced indexing cases, if the `result` and `self` arrays share the same data pointers, use a copy of `result` to assign values to `self`, otherwise the outcome of the fancy indexing suffers from mutation of `result` before the assignments are complete. * MAINT, BUG: PR 26558 revisions * Avoid leaking the old `tmp_arr`, based on reviewer feedback. * Add a test for a similar 2D case that fails, then hoist `solve_may_share_memory()` check farther up in the control flow such that the test passes. * Add a reviewer-requested test for index overlap. * BUG: PR 26558 revisions * The usage of `solve_may_share_memory` in the above PR wasn't quite right since it ignored the case of failing to solve the overlap problem. This has been revised according to reviewer feedback. Co-authored-by: Sebastian Berg <sebastianb@nvidia.com> --------- Co-authored-by: Sebastian Berg <sebastianb@nvidia.com>
Fixes Fancy indexing fails on a[idx] = a #26542
For these very simple advanced indexing cases, if the
result
andself
arrays share the same data pointer, use a copy ofresult
to assign values toself
, otherwise the outcome of the fancy indexing suffers from mutation ofresult
before the assignments are complete.