MAINT: Speed up numpy.nonzero. #18368

touqir14 · Feb 8, 2021

This PR solves issue 11569. @tylerjereddy also requested optimizations for large boolean arrays. I have added optimizations to numpy.nonzero for contiguous and aligned numpy arrays. Generally, the speedup is at least 2 times and the maximum speed gain observed is about 8 times. Currently, the optimizations are exclusive to all int types, float types and bool type for arrays of dimension 1 to 3. If needed, optimizations for higher dimensional arrays can be added. Test cases have been added.

The following code illustrates some of the speed gains:

import numpy as np

x = np.random.rand(1000000) > 0.5
%timeit np.nonzero(x)
666 µs ± 2.04 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) [With optimization]
4.33 ms ± 17.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) [Without optimization]

x = np.random.rand(1000,1000) > 0.5
%timeit np.nonzero(x)
1.18 ms ± 2.68 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) [With optimization]
9.55 ms ± 32.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) [Without optimization]

x = np.random.rand(100,100,100) > 0.5
%timeit np.nonzero(x)
1.48 ms ± 3.55 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) [With optimization]
9.57 ms ± 44.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) [Without optimization]

x = np.random.rand(1000000) > 0.9
%timeit np.nonzero(x)
655 µs ± 2.46 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) [With optimization]
1.77 ms ± 7.87 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) [Without optimization]

x = np.random.rand(1000,1000) > 0.9
%timeit np.nonzero(x)
1.02 ms ± 2.46 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) [With optimization]
4.37 ms ± 12.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) [Without optimization]

x = np.random.rand(100,100,100) > 0.9
%timeit np.nonzero(x)
1.22 ms ± 9.38 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) [With optimization]
3.9 ms ± 65.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) [Without optimization]

x = np.random.randint(0,9, size=(1000,1000), dtype=np.int32)
%timeit np.nonzero(x)
1.69 ms ± 2.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) [With optimization]
7.83 ms ± 17.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) [Without optimization]

x = np.random.randint(0,2, size=(1000,1000), dtype=np.int32)
%timeit np.nonzero(x)
1.49 ms ± 5.86 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) [With optimization]
10.4 ms ± 31.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) [Without optimization]

x = np.random.randint(0,9, size=(1000,1000), dtype=np.int64)
%timeit np.nonzero(x)
2.3 ms ± 6.37 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) [With optimization]
8.15 ms ± 29.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) [Without optimization]

x = np.random.randint(0,2, size=(1000,1000), dtype=np.int64)
%timeit np.nonzero(x)
1.93 ms ± 4.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) [With optimization]
10.7 ms ± 22.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) [Without optimization]

charris · Feb 8, 2021

Please note the NumPy C_STYLE_GUIDE.

touqir14 · Feb 8, 2021

I should mention that I just found out that the workstation on which I ran the benchmarking script also had the numpy.count_nonzero optimization that I implemented in PR 18183 and so a minor part of the speedups is coming from there.

seberg · Feb 8, 2021

numpy/core/src/multiarray/item_selection.c

+            to_jmp = nonzero_idxs_dispatcher_ND((void*)data, multi_index, PyArray_SHAPE(self), PyArray_STRIDES(self), dtype->type_num, nonzero_count, ndim);
+        } else if (dtype->byteorder == '<') {
+            to_jmp = nonzero_idxs_dispatcher_ND((void*)data, multi_index, PyArray_SHAPE(self), PyArray_STRIDES(self), dtype->type_num, nonzero_count, ndim);
+        } else if (dtype->byteorder == '|' && dtype->elsize == 1) {


Fly-by comment. I have not looked at this enough yet. We have PyArray_ISNBO for this, the dispatch as is seems overly complex. (I am also confused about the == '<' are you assuming a little endian machine?

I might love to think about putting this on the dtypes, but it probably is just hassle here.

One other thing to think about: Make sure you time your results with fortran (and sliced fortran arrays). Also, you may not really need 2D or 3D loops much, if you check whether you can iterate it as a 1-D array instead. We have macros for that: PyArray_TRIVIALLY_ITERABLE

The optimization exploits C-contiguousness of arrays (without slices). With a little bit of change the code can be adjusted to work with slices. However, for such cases the bottleneck will be the cache misses when you have large arrays (and slices covering a small part) and so we might as well fall back to the generic iterator approach (thats what it is doing currently for slices). For the F-contiguous arrays I will have to use different loops (starting iteration from the first dimension and then moving inwards instead of the opposite needed for C-contiguous arrays). I can definitely add those loops to also cover optimization for F-contiguous arrays alongside the ones I have for handling C-contiguous arrays.

The reason why I am having multiple loops is to squeeze out as much performance as possible. Alternatively, I could just use 1 single loop and rely on multiple conditionals within the loop which would be a slower approach.

The way that numpy is working with byteordering is confusing me a bit. So, initially I did not have those conditions checking for byteorder and there is this one test case which was failing : https://github.com/numpy/numpy/blob/master/numpy/core/tests/test_regression.py#L1674-L1679 . The failure happens after byteswapping. Why is numpy treating a byteswapped array as if it was not byteswapped? My optimization does take into account that the array got byteswapped and the result changes after byteswapping. The old implementation however ignores the byteswapping. For the sake of consistency I also thought to follow this same behaviour. The byteorder condition hack that you referenced allowed the optimization to pass that test (since the optimization will be ignored if the byteorder got changed to big endian and rather the default iterator based implementation will be used).

So, by default, all arrays except the ones with elsize=1 will have byteorder : '=' irrespective of the endianness of the system? If that is the case when does an array have '>' or '<' ? What I intend to do is place some conditions that will detect that byteordering was changed and so, I can fall back to the default iterator approach which gives the same result even after byteswapping. Do you have any recommendations that is more reliable than that hack? There is 1 build failure which I am guessing is due to that hack being unreliable.

Nevermind, yeah. For nonzero it matters a lot what the input shape was, since it decides how many outputs you need.

By default it is probably = or | if byte swapping is not valid. But don't rely on that since it doesn't always hold currently. You can also request these explicitly arr = np.ones(3, dtype=">i8") for example.

The arr.byteswap() and arr.newbyteorder() commands are a bit strange maybe, since one swaps the bytes, without modifying the dtype and the other swaps the dtype but not the data. So in both cases the represented values changed. If you do both, the represented values stay the same though. (which that test does)

Just use that macro, which does the correct thing, by rather checking the opposite. If it was not native byte-order than it must be > (or < on a big-endian machine).

I have replaced that hack with PyArray_ISNBO . Should I go ahead and implement the loops for F-contagious arrays? Is there any documentation available on these handy macros or is it just that one can only dig into the codebase for finding them out (or just ask around) 😄 ?

Most of these macros are public API which is largely documented in the reference manual... Beyond that, there isn't much docs, so in some cases there may not be any documentation.

Sorry about the f-contiguous thing, it may be nice, but I am not sure it is worth the trouble (at least in the first step). Just be sure that it doesn't get much slower for large F-contiguous arrays.

…mensions

touqir14 · Feb 9, 2021

@seberg ,I have pushed recent commits that optimize for arrays and views that are F-contiguous, including those arrays with a fortran based memory layout. So, currently, any aligned and contiguous array will enjoy the optimization. Thats it from me for now. You can continue with the review.

Regarding the speed on F-contiguous arrays, the speed matches that of C-contiguous arrays given the shapes are transposed. So, a C-contiguous array with shape: [m, n] will attain the same speed as an F-contiguous array with shape: [n, m] given they have the same values (since loops stop early if there are no remaining nonzero values).

seberg · Feb 12, 2021

I was just now surprised by how much slower nonzero is compared to the comparison itself, i.e. trues = x > 0.5; np.nonzero(trues) compared to np.nonzero(x > 0.5), that realigns my view on this entirely. I think we definitely should do it.

In [8]: x = np.random.rand(1000,1000)
In [9]: %timeit x > 0.5
487 µs ± 59.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [10]: %timeit np.nonzero(x > 0.5)
9.24 ms ± 103 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

There are a few things left to do, but I could help a bit if you are running low on enthusiasm:

We should have benchmarks for all the cases you explicitly made faster. And I would like to include at least one or two also for np.nonzero(double_array > 0.5) style call, because I think that is effectively the more common case. I assume this means we need to add new ones to numpy/benchmark/.
You have pretty big macros right now. Those macros should be functions written in the NumPy "templating" language. The easiest way here will be to move those functions into lowlevel_strided_loops.c.src. That should will likely allow you to combine F and C order functions into a single one. (Not sure about functions for the number of dimensions.)

Question to be sure (maybe it is obvious from the tests, in which case, don't bother):

Optimizing F order as well is nice, but I think np.nonzero might always return C-order indices right now. We must not change that. We should have a test for it, and if the current code should change it, maybe just pull it out again for the moment.

Backburner:

There are still some C-style nits (long lines, missing brackets)
I am a bit curious (if I am not off track again), whether skipping the count_nonzeros and instead using realloc is useful. I guess it should not be, since that is so much faster than the actual nonzero core?
The huge speedup seems almost "too large", I am wondering if the generic path is not implemented much slower than it should be.

touqir14 · Feb 12, 2021

I am actually not acquainted with the numpy template language. It is possible to combine the f and c style loops using more macros but I intentionally left them separated for better code readability and maintainability. Still if we must combine the f and c style loops together, how about we try to do that in a fresh PR? We can then also think about ways to generalize to higher dimensional loops?

The output of np.nonzero is a tuple of 1D arrays. F style or C style does not affect the output. Is that what you were concerned about?

touqir14 · Feb 12, 2021

The use of count_nonzero actually benefits the nonzero function's speed for sparse cases. The count_nonzero probably accounts for 1/6th or less time which I think is worth being included for the sparse cases. The higher the dimensions the lower the overhead is due to count_nonzero. The SIMD optimization from 18183 has really been useful to keep the overhead low.

seberg · Feb 12, 2021

The c.src are a bit unique, but not very tricky see here for details: https://numpy.org/doc/stable/reference/distutils_guide.html#other-files. I am not sure if it makes sense to combine fortran/C into a single chunk, so fine with leaving it for now.

My question about the order is the output is about this example:

In [4]: np.nonzero(np.ones((2, 2)))
Out[4]: (array([0, 0, 1, 1]), array([0, 1, 0, 1]))

In [5]: np.nonzero(np.ones((2, 2), order="F"))
Out[5]: (array([0, 0, 1, 1]), array([0, 1, 0, 1]))

I.e. its easy to imagine that the last might instead return:

(array([0, 1, 0, 1]), array([0, 0, 1, 1]))

which is also correct but different.

touqir14 · Feb 12, 2021

Actually, this implementation returns the second one : (array([0, 1, 0, 1]), array([0, 0, 1, 1])) . This is unavoidable if we do not want to lose performance. For f-style arrays, f-style loops are always faster. Is it really important to preserve the order? In fact, if those indices are somehow later used to index that f-style array for some BLAS operation, this should be faster than having the output given in a c-order.

seberg · Feb 12, 2021

I have to think about it, but it is a subtle change that I would prefer we don't include here yet (to ensure that the rest doesn't get stalled because of that!).
The solution might also be to add np.nonzero(..., order="C"). I.e. an order parameter (we could limit support to only C or K or so, where K would mean "I don't care").

touqir14 · Feb 12, 2021

I like the order parameter approach. Just wondering, do you know which ones might get stalled due to the change in the ordering?

seberg · Feb 12, 2021

I meant progress on this PR getting stalled, because by the question of changing the output order being OK. A new order keyword argument is likely pretty uncontroversial, but also feels like it may be a good followup.

touqir14 · Feb 12, 2021

@seberg , I am getting this error after modifying the low level strided loops file : ValueError: In "numpy/core/src/multiarray/lowlevel_strided_loops.c.src" loop at line 1: no definition of key "elsize" and I haven't modified anything. Its probably something common and there is a fix.

seberg · Feb 12, 2021

@touqir14 sounds like the "preprocessor" is breaking on something, maybe you are using @elsize@, but did not properly define a loop elsize=1,2,4 (or because you edited some /**begin repeat that is not quite right and a later one is confused by it. If you can show the diff, I can take a look.

touqir14 · Feb 13, 2021

There was a typo that I wrote in one of the templating syntaxes. I have pushed the ported code to lowlevel_strided_loops file. Let me know how it looks.

seberg · Feb 13, 2021

numpy/core/src/multiarray/lowlevel_strided_loops.c.src

+    }
+
+
+#define nonzero_idxs_3D_F(data, idxs, shape, strides, nonzero_count, dtype) \


I was hoping to avoid these giant in-line macros, maybe a bit out of habit in NumPy, but I don't really like them, and you could make a repeat for the dtype.
You can use such a repeat even inside the "dispatcher" function body, and here itself, then it these become proper functions.

Now, I am not quite sure it actually will end up much better, since we don't need the functions (as in functions you can get a pointer for here).

I have converted C and F style loops into functions and condensed the dispatcher function as you suggested @seberg .

Just added the benchmarks as well. Using the SIMD (applicable for bool, int) optimized count_nonzero function, doing np.nonzero(array) is faster than np.nonzero(array>0.5).

Btw, whats wrong with CI's smoke test? Looks like there is a problem with the pip.

touqir14 · Feb 25, 2021

@mattip , I have fixed the errors. Would you be able to review this?

mattip · Feb 25, 2021

Hmm. @seberg I would think this refactor should bring nonzero closer to the ufunc style of code, no? Or is this PR a first step in that direction?

seberg · Feb 25, 2021

@mattip you are right, its a step in the opposite direction unfortunately, although not a particularly big one. The problem is that nonzero is slightly special, because it has to allocate the correct output shape first (using count_nonzeros).

I couldn't think of an obvious way to wrangle it into a gufunc, but maybe that is also because we don't have many gufuncs yet, so considering how massive the speedup is, I was willing to make an exception with the hope we can consolidate it again.

mattip · Feb 25, 2021

ok, thanks

touqir14 · Mar 3, 2021

Any progress with the review? @seberg @mattip

touqir14 · Mar 5, 2021

Close/reopen to restart CI

touqir14 · Mar 5, 2021

CI configuration needs updating. The errors are not due to this PR.

seberg

Thanks, I trust that the code does the righ thing. I am sorry about the bazillion style nits, but the code is pretty liberal with breaking the NumPy style guidelines.

I am wondering if you can't rewrite the while loops as for loops (potentially with the empty test field). Also I think that this might be one place where using NPY_GCC_OPT_3 is worthwhile and likely a much bigger difference than manually optimizing away a single if.

I would be happier if we can write the if here, it just makes for easier to read code, but if you time that its worth the micro-optimization even with -O3 then that is fine as well.

If its super annoying, I could try to do some of the style-fixups for you, I guess.

seberg · Mar 12, 2021

numpy/core/src/multiarray/lowlevel_strided_loops.c.src

+    npy_intp a = 0; 
+    while (added_count < nonzero_count) { 
+        *idxs = a;  
+        bool to_increment = ((bool) *data); 


I am pretty sure this is better written as != 0. Casting NaN or inf to bool might set the "invalid" floating point exception, which we do not want here. We never actually check the FPE flags here probably, but I still think its better.

Can you write this to_increment as an if? I will trust the compiler can generate roughly the same code this is not some important micro-optimization?

seberg · Mar 12, 2021

numpy/core/src/multiarray/lowlevel_strided_loops.c.src

+/**begin repeat
+ * 
+ * #dtype = npy_bool, npy_byte, npy_byte, npy_uint16, npy_int16, npy_uint32, npy_int32, npy_uint64, npy_int64, npy_float, npy_double#
+ * #name = bool, u8, i8, u16, i16, u32, i32, u64, i64, f32, f64#


long lines, everywhere. Please stay below 80 characters, unless the overhang is small and wrapping seem untidy.

seberg · Mar 12, 2021

numpy/core/src/multiarray/lowlevel_strided_loops.c.src

+
+
+#define ptr_assignment1(ptr1, val1, stride1) *ptr1 = val1; ptr1 += stride1; 
+#define ptr_assignment2(ptr1, val1, stride1, ptr2, val2, stride2) ptr_assignment1(ptr1, val1, stride1) *ptr2 = val2; ptr2 += stride2; 


To be honest, I can't parse well "ptr_assignment2" means exactly. The first macro seems OK to just inline, the second probably as well?

seberg · Mar 12, 2021

numpy/core/src/multiarray/lowlevel_strided_loops.c.src

+
+/**begin repeat1
+ *
+ * #layout=C,F#


Didn't we say not to deal with F right now?

seberg · Mar 12, 2021

numpy/core/src/multiarray/lowlevel_strided_loops.c.src

+        bool to_increment = ((bool) *data); 
+        idxs += ((int) to_increment); 
+        added_count += ((int) to_increment); 
+        data = (@dtype@ *) (((char*) data) + stride); 


Data is typed += will work just fine. If this is c-continguous, data++; will work. But I think it is good if this is triggered for all 1-D arrays.

seberg · Mar 12, 2021

numpy/core/tests/test_numeric.py

+            assert_equal(nzs, 0)
+
+
+


Excessive white space. There should be one empty line between def's in classes. We are pretty strict on PEP.

seberg · Mar 12, 2021

numpy/core/tests/test_numeric.py

+            assert_equal(np.count_nonzero(y_view), len(idxs_0))
+            nzs = 0
+            for i,j in zip(idxs_0, idxs_1):
+                if y_view[i,j] == 0:


Suggested change

if y_view[i,j] == 0:

if y_view[i, j] == 0:

seberg · Mar 12, 2021

numpy/core/tests/test_numeric.py

+            for i,j in zip(idxs_0, idxs_1):
+                if x_view[i,j] == 0:
+                    nzs += 1
+            assert_equal(nzs, 0)


You can just use assert nzs == 0 if nzs is a Python scalar.

seberg · Mar 12, 2021

numpy/core/tests/test_numeric.py

+
+        for i in range(iters):  # check for slices
+            x = ((2**33)*np.random.randn(10, 10)).astype(np.int32)
+            x.flat[[2,6,9,15,21]] = 0


Suggested change

x.flat[[2,6,9,15,21]] = 0

x.flat[[2, 6, 9, 15, 21]] = 0

seberg · Mar 12, 2021

benchmarks/benchmarks/bench_core.py

+    param_names = ["dtype", "shape"]
+
+    def setup(self, dtype, size):
+        self.x = np.random.randint(0,2,size=size).astype(dtype)


We also need benchmarks for "sparse" and "nonsparse" data.

That is, please generate a dataset which has very few (maybe only a single 1 somewhere in the middle or at the end), and a dataset which has everything 1.

It might also be good to seed this, just to stabilize benchmarks a bit (although they tend to fluctuate absurdly often anyway).

seberg · Mar 12, 2021

numpy/core/src/multiarray/lowlevel_strided_loops.c.src

+        while (added_count < nonzero_count && b<size_1) { 
+            *idxs_1 = b; 
+            npy_uintp to_increment = (npy_uintp) ((bool) *data); 
+            idxs_1 += (idxs_stride & (-to_increment)); 


Oh forgot to press enter here I think. This deserved an 🤯 for me... I somewhat hope we could just do an if (*data != 0) and let the compiler do its magic, but if you say its worthwhile I am happy. I did try with godbold (I couldn't find the "share link" today), and I admit, the compiler does produce less code with this trick.

(I would like to make sure it is worthwhile with -O3 macro used)

seberg · Mar 12, 2021

numpy/core/tests/test_numeric.py

+            for i,j in zip(idxs_0, idxs_1):
+                if y_view[i,j] == 0:
+                    nzs += 1
+            assert_equal(nzs, 0)


And one more comment that got losts. These tests seem very repitetive, can you try to use pytest.mark.parametrize or create a helper function like check_nonzero_result?

dgiger42 · Jul 9, 2021

@seberg It looks like there hasn't been any activity on this PR since March. Since I'd like to be able to use this optimization, would it make any sense for me to fork the repo this PR is from, attempt to address your feedback, and make a new PR?

charris · Jul 10, 2021

@danielg1111 It is generally OK as long as the original work is credited, usually by working on top of existing commits. It would smooth the way if @touqir14 weighed in and gave permission.

touqir14 · Jul 10, 2021

@danielg1111 @charris I haven't pushed some of my local updates yet which I hope to do in the next few days. I have been a little busy and haven't worked on this for a while. @danielg1111 if you can wait a bit this PR should be completed.

GaryJHarveyJr · Nov 30, 2021

A faster nonzero would definitely speed up some code I am writing to work in Avizo. Does anyone know if these fixes will be available anytime soon? Thanks!

eendebakpt · Apr 25, 2022

@touqir14 @danielg1111 The PR has been stale for some time, but it would be nice to get it going again. Does either one of you want to continue with it at this moment?

I checked with a recent numpy , and there is still room for improvement of nonzero

touqir14 · Apr 25, 2022

I sincerely apologize to the Numpy team for dragging this PR for so long. I have been quite busy with my work and some of my side projects. I have research paper submissions on May 3rd, and right after that I will make this my priority, @eendebakpt . Thank you for understanding.

eendebakpt · Apr 25, 2022

@touqir14 No problem. Ping me once you start working, and I'll help reviewing any updates

seberg · May 30, 2022

Just to note, since we now have quite a bit of C++ code elsewhere. It might make sense to aim for C++. And maybe it would have been nicer from the start to just create a new file for it (even nonzero.cpp).
(I don't remember if that is a full round from the start, but back then we didn't really have C++ yet...)

seberg · Dec 14, 2022

I am going to close the PR since it is stale. I do think it is a great idea, though, and very much worthwhile! Please do not hesitate to open a new PR based on this!

But at this point it seems like a small project to continue unfortunately. @seiko2plus just in case this grabs your interest.

EDIT: Just to note, this was discussed on the triage call today.

ogencoglu · Oct 5, 2024

This would be great to have. Shame that it is closed.

eendebakpt · Oct 6, 2024

Since the PR has been stale for a long time I intend to pick up the work. I will start by making PRs for the benchnmarks and unit tests. Then I will make a PR for the most important case C-contiguous float64 and int64. If someone is still interested to pick this up, let me know.

touqir14 added 2 commits February 8, 2021 17:35

Added optimizations and tests for numpy.nonzero

c404910

Changed np.bool to bool in the tests

a2b2a35

charris changed the title ~~Speeding up numpy.nonzero.~~ MAINT: Speed up numpy.nonzero. Feb 8, 2021

github-actions bot added the 03 - Maintenance label Feb 8, 2021

Enabled optimization for undefined byteorder

07dc717

seberg reviewed Feb 8, 2021

View reviewed changes

touqir14 added 5 commits February 9, 2021 02:30

Fixed the byteordering swapping issue using PyArray_ISNBO

9b5c1c4

Added code to work with swapped axes views

5c79ab3

added optimizations for fortran style arrays!

0015523

F_contiguous and C_contiguous optimizations both work correctly!

36801dc

Added more tests covering F_contiguous arrays, slices and permuted di…

399d1b9

…mensions

touqir14 added 2 commits February 13, 2021 05:58

Ported the low level loops to the low level loops strided template file

8435e97

Merge remote-tracking branch 'upstream/master' into faster_nonzero

2dbe866

seberg reviewed Feb 13, 2021

View reviewed changes

touqir14 added 2 commits February 13, 2021 11:44

Transformed the C and F style loops from macro to functions

aa422cb

Added new benchmarking code for np.nonzero

a302d4a

Base automatically changed from master to main March 4, 2021 02:05

touqir14 closed this Mar 5, 2021

touqir14 reopened this Mar 5, 2021

seberg reviewed Mar 12, 2021

View reviewed changes

dgiger42 mentioned this pull request Jul 9, 2021

Improve performance of numpy.nonzero #11569

Open

seberg added the 64 - Good Idea Inactive PR with a good start or idea. Consider studying it if you are working on a related issue. label Nov 2, 2022

seberg closed this Dec 14, 2022

eendebakpt mentioned this pull request Oct 6, 2024

BENCH: Add benchmarks for np.non_zero #27517

Merged

This was referenced Oct 6, 2024

TST: Add tests for np.nonzero with different input types #27518

Merged

ENH: Improve performance of numpy.nonzero for 1D/2D contiguous arrays #27519

Open

		}


		#define nonzero_idxs_3D_F(data, idxs, shape, strides, nonzero_count, dtype) \



		#define ptr_assignment1(ptr1, val1, stride1) *ptr1 = val1; ptr1 += stride1;
		#define ptr_assignment2(ptr1, val1, stride1, ptr2, val2, stride2) ptr_assignment1(ptr1, val1, stride1) *ptr2 = val2; ptr2 += stride2;

Search code, repositories, users, issues, pull requests...

Uh oh!

MAINT: Speed up numpy.nonzero. #18368

MAINT: Speed up numpy.nonzero. #18368

Uh oh!

Conversation

touqir14 commented Feb 8, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

charris commented Feb 8, 2021

Uh oh!

touqir14 commented Feb 8, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

touqir14 Feb 8, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

touqir14 commented Feb 9, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

seberg commented Feb 12, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

touqir14 commented Feb 12, 2021

Uh oh!

touqir14 commented Feb 12, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

seberg commented Feb 12, 2021

Uh oh!

touqir14 commented Feb 12, 2021

Uh oh!

seberg commented Feb 12, 2021

Uh oh!

touqir14 commented Feb 12, 2021

Uh oh!

seberg commented Feb 12, 2021

Uh oh!

touqir14 commented Feb 12, 2021

Uh oh!

seberg commented Feb 12, 2021

Uh oh!

touqir14 commented Feb 13, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

touqir14 commented Feb 25, 2021

Uh oh!

mattip commented Feb 25, 2021

Uh oh!

seberg commented Feb 25, 2021

Uh oh!

mattip commented Feb 25, 2021

Uh oh!

touqir14 commented Mar 3, 2021

Uh oh!

touqir14 commented Mar 5, 2021

Uh oh!

touqir14 commented Mar 5, 2021

Uh oh!

seberg left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

touqir14 commented Feb 8, 2021 •

edited

Loading

touqir14 commented Feb 8, 2021 •

edited

Loading

touqir14 Feb 8, 2021 •

edited

Loading

touqir14 commented Feb 9, 2021 •

edited

Loading

seberg commented Feb 12, 2021 •

edited

Loading

touqir14 commented Feb 12, 2021 •

edited

Loading

seberg commented Dec 14, 2022 •

edited

Loading