BUG: Make np.nonzero threading safe #28361

eendebakpt · Feb 20, 2025

We add a test for np.nonzero under multi-threading and make np.nonzero safe under the cpython free-threading build. We want to ensure concurrent invocations of the method on the same array do not corrupt the system. Correct results are not guaranteed:

If the underlying data is changing, the non-zero indices are not well-defined
If the underlying data is changing, we can get indices outside the size of the array due to a part of the return array not being initialized.

Also see #27519. An alternative approach would be to use locks to make the array read-only during the operation.

ngoldbaum · Feb 20, 2025

It looks like the TSAN tests are failing on the new test you added.

eendebakpt · Feb 20, 2025

It looks like the TSAN tests are failing on the new test you added.

Hmmm. For np.nonzero the result is not well-defined if the underlying data is changing. So should we care about a data race between the thread executing the np.nonzero, and some other thread modifying the data? If the data is modified, our results are not valid anyway.

(I am not familiar enough with TSAN do definitely conclude that this is happening though)

ngoldbaum · Feb 20, 2025

NumPy doesn't guarantee anything about concurrently modifying a shared array. We decided not to add locking for the free-threaded build since the same is true on the GIL-enabled build.

IMO in the long run we need something like an immutable (maybe copy-on-write?) and thread-safe object similar to ndarray.

Can you trigger the crash you saw without repeatedly writing to x in the test closure?

Also, out of curiosity, how did you run across this?

seberg · Feb 20, 2025

Unless we have a way to more generally make array access like this safer, I think the test is just about "don't segfault when it happens". Is it maybe right to just skip the test in TSAN?!

seberg · Feb 20, 2025

Also, out of curiosity, how did you run across this?

This wasn't actual code, it was just a realization when looking at the code that it is clearly not thread safe.

ngoldbaum · Feb 20, 2025

The TSAN tests only run in all the test files that do "import threading":

numpy/.github/workflows/compiler_sanitizers.yml

Line 126 in a3f6ff0

    
                   `find numpy -name "test*.py" | xargs grep -l "import threading" | tr '\n' ' '` \

You could make that logic a little smarter to pick up tests we want to intentionally exclude. We don't want to run the full test suite because it takes more than an hour with TSAN on the github runners.

If you do exclude the test, maybe add some comments saying that we know this is racy and shouldn't be run under TSAN?

Another thing we could do is create a suppressions file and use it.

ngoldbaum · Feb 21, 2025

Hmm, it's not great that the TSAN tests ran for 6 hours before failing. I'm going to push a patch to the github actions configuration for that job to check if we can avoid that and still fail the test if there's a TSAN failure. Sorry for pushing to your PR branch...

… nonzero_unit_tests

ngoldbaum

Thanks so much for working on this and figuring out the suppressions. A few comments.

numpy/_core/src/multiarray/item_selection.c

numpy/_core/tests/test_multithreading.py

ngoldbaum · Feb 21, 2025

numpy/_core/tests/test_multithreading.py

+    x = np.random.randint(4, size=10_000).astype(dtype)
+
+    def func(seed):
+        x[::2] = np.random.randint(2)


I'm not actually sure what the best thing to do here is. It's probably better for each thread to get its own private RNG so there's no cross-talk or shared RNG state. I know the new numpy RNG infrastructure had a lot of careful thought put into it to make it possible to do stuff like this.

Maybe @rkern has a suggestion here?

I do not think the randomization is really important, we could also just flip between many non-zeros and many zeros in the array. I removed the seed argument

numpy/_core/tests/test_multithreading.py

ngoldbaum · Feb 21, 2025

.github/workflows/compiler_sanitizers.yml

@@ -121,7 +121,7 @@ jobs:
    - name: Test
      run: |
        # These tests are slow, so only run tests in files that do "import threading" to make them count
-        TSAN_OPTIONS=allocator_may_return_null=1:halt_on_error=1 \
+        TSAN_OPTIONS="allocator_may_return_null=1:suppressions=tools\ci\tsan_suppressions.txt" \


maybe try using an absolute path on the runner?

I'll try that, assuming the absolute path is relative stable

You can use an environment variable to get the base directory: https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/store-information-in-variables#default-environment-variables

Ok, both GITHUB_WORKSPACE and RUNNER_WORKSPACE seem to point to the right location. Picking the first option.

eendebakpt · Feb 22, 2025

There might be two data races going on here: one in np.nonzero, the other one because multiple threads are writing to the same array. To avoid the second one we can either add more suppression entries to the TSAN configuration, or refactor the test so that there is 1 thread modifying the array and multiple threads performing the np.nonzero operation. I am leaning towards the latter option.

ngoldbaum · Feb 22, 2025

The latter sounds better indeed.

ngoldbaum · Feb 22, 2025

Thanks @eendebakpt!

[ENH] add multi-threading test for np.nonzero

f2bba73

github-actions bot added the 01 - Enhancement label Feb 20, 2025

eendebakpt added 5 commits February 20, 2025 16:08

lint

7c6fb61

split test

9e52805

fix buffer overflow

cc70771

fix one more buffer overflow

194acb7

fix test

4fc40fe

eendebakpt changed the title ~~ENH: Add multi-threading test for np.nonzero~~ BUG: Make np.nonzero threading safe Feb 20, 2025

parameterize tests

ae3e0d6

eendebakpt requested a review from ngoldbaum February 20, 2025 19:34

MAINT: turn off halt_on_error on sanitizer CI

87ecb48

charris added 00 - Bug 09 - Backport-Candidate PRs tagged should be backported 39 - free-threading PRs and issues related to support for free-threading CPython (a.k.a. no-GIL, PEP 703) and removed 01 - Enhancement labels Feb 21, 2025

eendebakpt added 3 commits February 21, 2025 21:05

attampt to add TSAN suppressions

1df3dea

try to suppress

cbc9422

Merge branch 'nonzero_unit_tests' of github.com:eendebakpt/numpy into…

8e1f08b

… nonzero_unit_tests

ngoldbaum reviewed Feb 21, 2025

View reviewed changes

eendebakpt added 2 commits February 22, 2025 14:07

review comments

00a87a6

more suppressions

3b27641

eendebakpt added 2 commits February 22, 2025 21:10

avoid race in writing

60acc1a

cleanup

d1c7b4a

ngoldbaum merged commit a1f2d58 into numpy:main Feb 22, 2025
68 checks passed

charris mentioned this pull request Feb 23, 2025

BUG: Make np.nonzero threading safe #28385

Merged

charris removed the 09 - Backport-Candidate PRs tagged should be backported label Feb 23, 2025

eendebakpt mentioned this pull request Feb 27, 2025

ENH: Improve performance of numpy.nonzero for 1D/2D contiguous arrays #27519

Open

Search code, repositories, users, issues, pull requests...

Uh oh!

BUG: Make np.nonzero threading safe #28361

BUG: Make np.nonzero threading safe #28361

Uh oh!

Conversation

eendebakpt commented Feb 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ngoldbaum commented Feb 20, 2025

Uh oh!

eendebakpt commented Feb 20, 2025

Uh oh!

ngoldbaum commented Feb 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

seberg commented Feb 20, 2025

Uh oh!

seberg commented Feb 20, 2025

Uh oh!

ngoldbaum commented Feb 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ngoldbaum commented Feb 21, 2025

Uh oh!

ngoldbaum left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ngoldbaum Feb 21, 2025

Choose a reason for hiding this comment

Uh oh!

eendebakpt Feb 22, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ngoldbaum Feb 21, 2025

Choose a reason for hiding this comment

Uh oh!

eendebakpt Feb 22, 2025

Choose a reason for hiding this comment

Uh oh!

ngoldbaum Feb 22, 2025

Choose a reason for hiding this comment

Uh oh!

eendebakpt Feb 22, 2025

Choose a reason for hiding this comment

Uh oh!

eendebakpt commented Feb 22, 2025

Uh oh!

ngoldbaum commented Feb 22, 2025

Uh oh!

ngoldbaum commented Feb 22, 2025

Uh oh!

Uh oh!

Uh oh!

eendebakpt commented Feb 20, 2025 •

edited

Loading

ngoldbaum commented Feb 20, 2025 •

edited

Loading

ngoldbaum commented Feb 20, 2025 •

edited

Loading