Description
I don't have a small, simple reproducer that crashes immediately, but I can make Python segfault on my Mac with the following script in the free-threaded build:
import threading
import numpy as np
arr = np.ones(int(1e7)).astype('object')
n_threads = 20
barrier = threading.Barrier(n_threads)
def work():
global arr
barrier.wait()
print(repr(arr))
arr += 1
print(repr(arr))
threads = [threading.Thread(target=work) for _ in range(n_threads)]
[t.start() for t in threads]
[t.join() for t in threads]
print(repr(arr))
It doesn't segfault with smaller arrays or with fewer threads, so it looks like the timing needs to be pretty "lucky" to trigger the race condition.
This is really a symptom of the broader (preexisting) lack of thread safety in NumPy, but now that the GIL can be disabled it can segfault CPython by allowing thread-unsafe access to python objects stored in the array.
We might be able to use the critical section API to lock around the owning array, see #26157 (comment) and #26157 (comment) for some related discussion and possible complication for that approach.
In the short term, the plan is to document that mutating arrays that are shared between threads is unsafe, and that mutating objects arrays shared between threads may cause crashes if array elements are mutated simultaneously.