Description
Crash report
What happened?
This is a bit of a tricky situation, but it is real and impacting my ability to use tracemalloc. As background, I've added code to Polars to make it record all of its allocations in tracemalloc, and this is enabled in debug builds. This then allows writing unit tests that check memory usage, which is very useful in ensuring high memory usage is fixed, and making sure it doesn't get high again.
Unfortunately, I'm hitting a situation where tracemalloc causes segfaults in multi-threaded situations. I believe that this is a race condition between PyTraceMalloc_Track()
in a new non-Python thread that does not hold the GIL, and tracemalloc.stop()
being called in another thread. My hypothesis in detail:
- Thread 1 does
tracemalloc.start()
. - Thread 2 does not hold the GIL.
- Thread 2 calls PyTraceMalloc_Track().
- Thread 2 checks if tracemalloc is enabled, the answer is yes.
- Thread 2 starts acquiring the GIL (see PyTraceMalloc_Track implementation). As part of acquiring the GIL memory needs to be allocated.
- Thread 2 therefore once again checks if tracemalloc is enabled (I haven't actually read source code enough to verify if this part is true, but the coredump suggests it probably is). The answer is yes.
- At this point, Thread 1 calls
tracemalloc.stop()
. - Thread 2 continues as if tracemalloc enabled, but it is not!
- BOOM
If this hypothesis is correct, the solution would for GIL acquisition to bypass tracemalloc altogether if it allocates; it's not like it allocates a lot of memory, so not tracking it is fine. This may be difficult in practice, so another approach would involve having an additional lock so there's no race condition around checking if tracemalloc is enabled.
Here is a stack trace from a coredump from the reproducer (see below) that led me to the above hypothesis:
#0 0x00000000005f0482 in traceback_new () at ../Python/tracemalloc.c:371
#1 0x00000000005f0083 in tracemalloc_add_trace (domain=0, ptr=132539872185888, size=336) at ../Python/tracemalloc.c:471
#2 0x00000000006930db in tracemalloc_alloc (elsize=336, nelem=<optimized out>, ctx=0x9ec4f8 <_PyRuntime+10136>, use_calloc=<optimized out>) at ../Python/tracemalloc.c:541
#3 tracemalloc_raw_alloc (use_calloc=<optimized out>, ctx=0x9ec4f8 <_PyRuntime+10136>, nelem=<optimized out>, elsize=336) at ../Python/tracemalloc.c:715
#4 0x0000000000639a52 in alloc_threadstate (interp=0x9ff6b0 <_PyRuntime+88400>) at ../Python/pystate.c:1452
#5 new_threadstate (interp=0x9ff6b0 <_PyRuntime+88400>, whence=4) at ../Python/pystate.c:1563
#6 0x0000000000480ef0 in PyGILState_Ensure () at ../Python/pystate.c:2766
#7 0x00000000005203fc in PyTraceMalloc_Track (domain=123, ptr=10, size=1) at ../Python/tracemalloc.c:1318
#8 0x0000788b7088774c in tracemalloc_repro::in_thread () at src/lib.rs:23
To run the reproducer you will need to pip install rustimport
and have Rust installed. (I tried with Cython, had a hard time, gave up.)
Here's the Python file:
import gc
import time
import tracemalloc
import rustimport.import_hook
# rustimport will automatically compile tracemalloc_repro.rs into an extension:
import tracemalloc_repro
tracemalloc.start()
# This launches ~50 threads that run in parallel to this one:
tracemalloc_repro.run()
# This happens in parallel to the new threads running:
tracemalloc.stop()
gc.collect()
time.sleep(10)
And here is the Rust file, you should call it tracemalloc_repro.rs
:
// rustimport
//: [package]
//: name = "tracemalloc_repro"
//: version = "0.1.0"
//: edition = "2021"
//:
//: [lib]
//: name = "tracemalloc_repro"
//: crate-type = ["cdylib"]
//:
//: [dependencies]
//: pyo3 = {version = "0.23", features = ["extension-module"]}
//: libc = "0.2"
use pyo3::prelude::*;
use libc::{c_int, c_uint, size_t, uintptr_t};
extern "C" {
fn PyTraceMalloc_Track(domain: c_uint, ptr: uintptr_t, size: size_t) -> c_int;
}
fn in_thread() {
let result = unsafe { PyTraceMalloc_Track(123, 10, 1) };
println!("Result of tracking: {result}");
}
#[pyfunction]
fn run(py: Python) {
py.allow_threads(|| {
// With GIL released, run function in a thread.
for _ in 0..50 {
let _ = std::thread::spawn(in_thread);
}
});
}
#[pymodule]
fn tracemalloc_repro(_py: Python, m: &Bound<'_, PyModule>) -> PyResult<()> {
m.add_function(wrap_pyfunction!(run, m)?)?;
Ok(())
}
You can reproduce by calling repro.py
. Because this is a race condition, you may need to run it a few times; I had more consistent crashes with Python 3.12, but it does crash on Python 3.13. You may need to tweak the number 50 above to make it happen.
CPython versions tested on:
3.12, 3.13
Operating systems tested on:
Linux
Output from running 'python -VV' on the command line:
Python 3.13.1 (main, Dec 4 2024, 08:54:15) [GCC 13.2.0]
Linked PRs
- gh-128679: fix race condition in tracemalloc #128695
- gh-128679: Fix tracemalloc.stop() race condition #128710
- gh-128679: Redesign tracemalloc locking #128888
- gh-128679: Fix tracemalloc.stop() race conditions #128893
- [3.13] gh-128679: Fix tracemalloc.stop() race conditions #128897
- gh-128679: Skip test_tracemalloc_track_race() on debug build #128988
- [3.12] gh-128679: Fix tracemalloc.stop() race conditions (#128897) #129022
- gh-128679: Use _PyThreadState_GET() in tracemalloc.c #129126
- [3.13] gh-128679: Clear the ref trace in _PyTraceMalloc_Stop() #129258