-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
ENH: Improve performance of numpy.nonzero for 1D/2D contiguous arrays #27519
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments. I'll ask for more style cleanup before we merge this.
PyArrayObject* original_array = self; | ||
if (PyArray_BASE(self) != NULL) { | ||
original_array = (PyArrayObject*) PyArray_BASE(self); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't make any sense to me. What does the base have to do with anything here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. Removed.
if (PyArray_BASE(self) != NULL) { | ||
original_array = (PyArrayObject*) PyArray_BASE(self); | ||
} | ||
int bytes_not_swapped = PyArray_ISNOTSWAPPED(self) && PyArray_ISNOTSWAPPED(original_array); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checking for byte-swap is (just like in the other PR) important. Are there tests that fail without it? Because if not, there should be a new test for it (just like the one in the other PR which is currently failing).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test added
@@ -2801,6 +2801,57 @@ PyArray_CountNonzero(PyArrayObject *self) | ||
return nonzero_count; | ||
} | ||
|
||
static inline void nonzero_idxs_1D_bool(npy_intp count, npy_intp nonzero_count, char *data, npy_intp stride, npy_intp* multi_index) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll have formatting nits, but this is nice to move out!
++j; | ||
} | ||
} | ||
nonzero_idxs_1D_bool(count, nonzero_count, data, stride, multi_index); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this call belong to the family of fast functions in the other place? I.e. only the manual loop of nonzero
should stay here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to keep the diff short, but it indeed fits a bit better in the other block. Changed.
int M_type_num = dtype->type_num; | ||
|
||
bool executed = nonzero_idxs_dispatcher((void*)data, multi_index, M_dim, | ||
M_shape, M_strides, M_type_num, nonzero_count); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This branch is missing the thresholded GIL release (internals of it I guess).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Releasing the GIL (or running the cpython free-threading build) could result in a buffer overflow if another thread changes the number of non-zero entries in the array. I added a guard for that, but we should rerun the benchmarks (I still expect a good speedup though).
Should a test be added for this? (probably here: https://github.com/numpy/numpy/blob/main/numpy/_core/tests/test_multithreading.py)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, a case where we would actually want to lock down a whole array (but I think this is super hard with buffer exchange, so in practice...).
Would be very cool if you add a test. Larger use of free-threading is in the future after-all, and maybe not a too distant one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests have been added in #28361
Improve performance of
np.non_zero
for 1D and 2D contiguous arrays. The main cases improved are 2D boolean arrays and 1D/2D int and float arrays. The PR is based on work and ideas from @touqir14 in #18368Benchmark on main:
Benchmark on PR:
Benchmark script
Notes: