Description
Summary
I'm working on some static analysis tools that leverage RustPython's AST parser. The basic setup is that I enumerate a bunch of Python files, then in parallel (via Rayon), read them from disk, parse them with RustPython, and perform various operations on the AST.
This generally works great! However, I've noticed that the parallelized parsing can lead to irrecoverable errors (panics, segmentation faults, etc.) when the source Python files themselves contain certain contents, especially nested f-strings.
Here's an example snippet that fails for me occasionally, maybe one in twenty times, typically with Trace/BPT trap: 5
, though depending on how exactly I structure the code, I can also get a Segmentation fault: 11
:
use std::fs;
use std::path::Path;
use rayon::prelude::*;
use rustpython_parser::ast::Suite;
use rustpython_parser::parser;
fn main() {
[
Path::new("resources/test/broken/__init__.py").to_path_buf(),
Path::new("resources/test/broken/make_string.py").to_path_buf(),
]
.par_iter()
.map(|path| {
let contents = fs::read_to_string(path).unwrap();
parser::parse_program(&contents).unwrap()
})
.collect::<Vec<Suite>>();
}
...where __init__.py
is empty and make_string.py
looks like (sorry, weirdly specific example, but it's been difficult to pinpoint the exact issue):
where_statement = f"""
{' OR '.join([f'column_name = "{field}"' for field in ['plate', 'well']])}
"""
A few observations:
- On the larger Python codebase I'm using for development, I got a stack overflow 100% of the time until I removed the nested f-strings from two specific files (now it works without error every time). If helpful, I can probably come up with an example that errors consistently, but it may be more involved.
- If I remove Rayon, and do everything serially, the code never errors.
- If I run Rayon over a single file (e.g., remove the
__init__.py
from the above snippet), the code never errors. (I don't know if this is due to a Rayon optimization to not spin out a thread in that case, or something else.)