Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Irrecoverable errors when running AST parser in parallel #4099

Copy link
Copy link
Open
@charliermarsh

Description

@charliermarsh
Issue body actions

Summary

I'm working on some static analysis tools that leverage RustPython's AST parser. The basic setup is that I enumerate a bunch of Python files, then in parallel (via Rayon), read them from disk, parse them with RustPython, and perform various operations on the AST.

This generally works great! However, I've noticed that the parallelized parsing can lead to irrecoverable errors (panics, segmentation faults, etc.) when the source Python files themselves contain certain contents, especially nested f-strings.

Here's an example snippet that fails for me occasionally, maybe one in twenty times, typically with Trace/BPT trap: 5, though depending on how exactly I structure the code, I can also get a Segmentation fault: 11:

use std::fs;
use std::path::Path;

use rayon::prelude::*;
use rustpython_parser::ast::Suite;
use rustpython_parser::parser;

fn main() {
    [
        Path::new("resources/test/broken/__init__.py").to_path_buf(),
        Path::new("resources/test/broken/make_string.py").to_path_buf(),
    ]
    .par_iter()
    .map(|path| {
        let contents = fs::read_to_string(path).unwrap();
        parser::parse_program(&contents).unwrap()
    })
    .collect::<Vec<Suite>>();
}

...where __init__.py is empty and make_string.py looks like (sorry, weirdly specific example, but it's been difficult to pinpoint the exact issue):

where_statement = f"""
{' OR '.join([f'column_name = "{field}"' for field in ['plate', 'well']])}
"""

A few observations:

  • On the larger Python codebase I'm using for development, I got a stack overflow 100% of the time until I removed the nested f-strings from two specific files (now it works without error every time). If helpful, I can probably come up with an example that errors consistently, but it may be more involved.
  • If I remove Rayon, and do everything serially, the code never errors.
  • If I run Rayon over a single file (e.g., remove the __init__.py from the above snippet), the code never errors. (I don't know if this is due to a Rayon optimization to not spin out a thread in that case, or something else.)

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-compilerArea: compilerArea: compilerC-bugSomething isn't workingSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      Morty Proxy This is a proxified and sanitized view of the page, visit original site.