Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

ENH: Improve performance of numpy.nonzero for 1D/2D contiguous arrays #27519

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 20 commits into
base: main
Choose a base branch
Loading
from

Conversation

eendebakpt
Copy link
Contributor

@eendebakpt eendebakpt commented Oct 6, 2024

Improve performance of np.non_zero for 1D and 2D contiguous arrays. The main cases improved are 2D boolean arrays and 1D/2D int and float arrays. The PR is based on work and ideas from @touqir14 in #18368

Benchmark on main:

<module 'numpy' from '/home/eendebakpt/numpy/build-install/lib/python3.12/site-packages/numpy/__init__.py'> np.__version__='2.2.0.dev0+git20241010.a5cb327'
bool 816.1 [us]
int64 2855.6 [us]
float64 5777.2 [us]
bool 2D 515.2 [us]
float64 2D 824.7 [us]
float64 2D view 149.7 [us]

Benchmark on PR:

<module 'numpy' from '/home/eendebakpt/numpy/build-install/lib/python3.12/site-packages/numpy/__init__.py'> np.__version__='2.2.0.dev0+git20241011.60c9b88'
bool 845.4 [us]
int64 2004.0 [us]
float64 3997.1 [us]
bool 2D 85.2 [us]
float64 2D 302.7 [us]
float64 2D view 150.2 [us]
Benchmark script
import sys
import timeit
import numpy as np

print(f'{np} {np.__version__=}')

b=np.arange(1_000_000).astype(bool)
x=b.astype(int)
f=b.astype(np.float64)
w=np.random.rand(600, 100)
w[w<.5]=0
ww = w[::2, ::3]
b2d=np.random.rand(600, 100) > .5

n=100
t = timeit.timeit('np.nonzero(b)', globals=globals(), number=n)
print(f'{b.dtype} {1e6*t/n:.1f} [us]')
t = timeit.timeit('np.nonzero(x)', globals=globals(), number=n)
print(f'{x.dtype} {1e6*t/n:.1f} [us]')
t = timeit.timeit('np.nonzero(f)', globals=globals(), number=n)
print(f'{f.dtype} {1e6*t/n:.1f} [us]')
t = timeit.timeit('np.nonzero(b2d)', globals=globals(), number=n)
print(f'{b2d.dtype} 2D {1e6*t/n:.1f} [us]')
t = timeit.timeit('np.nonzero(w)', globals=globals(), number=n)
print(f'{w.dtype} 2D {1e6*t/n:.1f} [us]')
t = timeit.timeit('np.nonzero(ww)', globals=globals(), number=n)
print(f'{w.dtype} 2D view {1e6*t/n:.1f} [us]')

Notes:

@seberg seberg marked this pull request as draft October 9, 2024 09:17
@eendebakpt eendebakpt changed the title Draft: ENH: Improve performance of numpy.nonzero for 1D contiguous arrays Draft: ENH: Improve performance of numpy.nonzero for 1D/2D contiguous arrays Oct 11, 2024
@eendebakpt eendebakpt changed the title Draft: ENH: Improve performance of numpy.nonzero for 1D/2D contiguous arrays ENH: Improve performance of numpy.nonzero for 1D/2D contiguous arrays Oct 11, 2024
@eendebakpt eendebakpt marked this pull request as ready for review October 11, 2024 12:32
@eendebakpt eendebakpt requested a review from seberg November 25, 2024 12:40
Copy link
Member

@seberg seberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments. I'll ask for more style cleanup before we merge this.

PyArrayObject* original_array = self;
if (PyArray_BASE(self) != NULL) {
original_array = (PyArrayObject*) PyArray_BASE(self);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't make any sense to me. What does the base have to do with anything here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. Removed.

if (PyArray_BASE(self) != NULL) {
original_array = (PyArrayObject*) PyArray_BASE(self);
}
int bytes_not_swapped = PyArray_ISNOTSWAPPED(self) && PyArray_ISNOTSWAPPED(original_array);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking for byte-swap is (just like in the other PR) important. Are there tests that fail without it? Because if not, there should be a new test for it (just like the one in the other PR which is currently failing).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test added

@@ -2801,6 +2801,57 @@ PyArray_CountNonzero(PyArrayObject *self)
return nonzero_count;
}

static inline void nonzero_idxs_1D_bool(npy_intp count, npy_intp nonzero_count, char *data, npy_intp stride, npy_intp* multi_index)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll have formatting nits, but this is nice to move out!

++j;
}
}
nonzero_idxs_1D_bool(count, nonzero_count, data, stride, multi_index);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this call belong to the family of fast functions in the other place? I.e. only the manual loop of nonzero should stay here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to keep the diff short, but it indeed fits a bit better in the other block. Changed.

int M_type_num = dtype->type_num;

bool executed = nonzero_idxs_dispatcher((void*)data, multi_index, M_dim,
M_shape, M_strides, M_type_num, nonzero_count);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This branch is missing the thresholded GIL release (internals of it I guess).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Releasing the GIL (or running the cpython free-threading build) could result in a buffer overflow if another thread changes the number of non-zero entries in the array. I added a guard for that, but we should rerun the benchmarks (I still expect a good speedup though).

Should a test be added for this? (probably here: https://github.com/numpy/numpy/blob/main/numpy/_core/tests/test_multithreading.py)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, a case where we would actually want to lock down a whole array (but I think this is super hard with buffer exchange, so in practice...).

Would be very cool if you add a test. Larger use of free-threading is in the future after-all, and maybe not a too distant one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests have been added in #28361

@eendebakpt eendebakpt requested a review from seberg February 27, 2025 13:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
Morty Proxy This is a proxified and sanitized view of the page, visit original site.