Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Conversation

fullzer4
Copy link

@fullzer4 fullzer4 commented May 26, 2025

Add File System Support via pyodide.FS

This PR implements comprehensive file system operations in the PyodideSandbox, addressing issue #34 (Allow attaching files) and issue #26 (Preserve newlines when printing).

Key Features

File System Operations

  • File Attachment: Pre-load files into sandbox with attach_file() and attach_files()
  • Standard Python I/O: Full support for open(), Path(), and standard file operations
  • Multiple File Types: Handle text, binary files (CSV, images, etc.)
  • Directory Support: Organize files in nested directory structures

Bug Fixes

Implementation Details

  • Added FileSystemOperation interface for defining file operations
  • Implemented file system operations in main.ts using pyodide.FS
  • Migrated from BaseTool to StructuredTool for dynamic descriptions

Usage Example

# pip install langgraph-codeact "langchain[anthropic]"
import asyncio
from langchain_sandbox import PyodideSandboxTool
from langgraph.prebuilt import create_react_agent

# Define sandbox tool with filesystem support
sandbox_tool = PyodideSandboxTool(
    enable_filesystem=True,
    allow_net=True,
)

# Attach sample data
sales_data = """...csv_data"""

sandbox_tool.attach_file("sales.csv", sales_data)

# Create agent with sandbox tool
agent = create_react_agent("anthropic:claude-3-7-sonnet-latest", [sandbox_tool])

query = """Please analyze the sales data and tell me:
1. What is the total revenue by category?
2. Which region has the highest sales?
3. What are the top 3 best-selling products by revenue?

Use pandas to read the CSV file and perform the analysis."""

async def run_agent(query: str):
    async for chunk in agent.astream({"messages": query}):
        print(chunk)

if __name__ == "__main__":
    asyncio.run(run_agent(query))

References

@fullzer4 fullzer4 changed the title Add File System Support via pyodide.FS [WIP] Add File System Support via pyodide.FS May 26, 2025
@fullzer4

This comment was marked as outdated.

@fullzer4 fullzer4 changed the title [WIP] Add File System Support via pyodide.FS Add File System Support via pyodide.FS May 28, 2025
@fullzer4
Copy link
Author

fullzer4 commented May 28, 2025

Update: Feature Complete & Ready for Review

Quick Summary

StructuredTool Migration

The switch to StructuredTool enables dynamic descriptions that automatically list available files, improving developer experience by providing clear visibility into attached resources.

Technical Details

This implementation focuses only on MEMFS support as documented in the Pyodide File System documentation.


Ready for review! 🚀 Open to feedback and suggestions for improvements.

@eyurtsev @vbarda Could you please review when you have a chance? Thank you!

@eyurtsev eyurtsev self-assigned this May 28, 2025
Copy link
Collaborator

@eyurtsev eyurtsev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fullzer4 this PR is really amazing. Thank you for putting all of this together! I left a few questions about the extra entrypoints in the code and would like to figure out how to avoid them if possible.

Thanks also for fixing the various issues in the existing code! You caught more than one bug!

README.md Show resolved Hide resolved
examples/react_agent_with_csv.py Show resolved Hide resolved
libs/pyodide-sandbox-js/main.ts Show resolved Hide resolved
libs/sandbox-py/langchain_sandbox/pyodide.py Outdated Show resolved Hide resolved
libs/sandbox-py/langchain_sandbox/pyodide.py Outdated Show resolved Hide resolved
@eyurtsev
Copy link
Collaborator

I think we have to pass the file bytes via stdin rather than as arg as most OSs impose a max limit on the size of the arg value that can be passed

@fullzer4
Copy link
Author

fullzer4 commented Jun 2, 2025

@eyurtsev I've completed the filesystem implementation with stdin. It's working well and solves the CLI argument size limitations we discussed.

I chose a binary protocol over JSON stdin for file transfer, implemented with ReadableStream for efficient binary data handling, because it preserves binary data integrity and avoids "Maximum call stack size exceeded" errors that occur when sending large files as JSON through stdin. Would appreciate your thoughts on this approach.

I'm already working on the output functionality with a get_file method to extract files from the sandbox after execution. Will submit this in a separate PR once it's ready and tested.

Regarding the permissions flags, I had to make changes to fix critical errors like:

Error: Requires write access to "/home/fullzer4/path/node_modules/.deno/pyodide@0.27.5/node_modules/pyodide/micropip-0.8.0-py3-none-any.whl"

and Code

NotCapable: Requires read access to "/path/node_modules/.deno/pyodide@0.27.5/node_modules/pyodide/pyodide.asm.wasm"

The approach follows the principle of least privilege while ensuring everything works properly.

For the description part, I implemented a template-based system with _description_template and _build_description() that dynamically shows available files to the LLM:

A secure Python code sandbox with filesystem support... ATTACHED FILES AVAILABLE: • data.csvconfig.json These files are already loaded and ready to use...

The implementation now guarantees that when users provide custom descriptions, they are respected exactly as specified, while still supporting dynamic file information through an optional placeholder system.

I kept the filesystem methods (attach_file, create_directory, etc.) as they provide a clean, intuitive API. These methods make working with sandbox files much simpler than reimplementing these operations in user code.

I've also implemented the constructor-based file attachment you suggested, so users can now provide files directly when creating the sandbox tool:

sandbox_tool = PyodideSandboxTool(
    allow_net=True,
    files={
        "sales.csv": sales_data
    }
)

I decided to maintain both approaches - constructor-based initialization and method-based file operations - because they serve different use cases. The constructor approach is perfect for setting up files at creation time in a clean, declarative style, while the method-based approach allows for more dynamic file manipulation during the tool's lifecycle. This flexibility gives users the best of both worlds without forcing them into a single pattern.

Let me know if you'd like me to adjust anything before merging.

@fullzer4
Copy link
Author

fullzer4 commented Jun 4, 2025

Hi @eyurtsev, I hope you're doing well! I just wanted to gently check in on the filesystem support PR when you have a moment. I'm eager to hear your thoughts. Thank you so much!

Copy link
Collaborator

@eyurtsev eyurtsev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few questions and I'd like to remove the mutability and and various help methods on the Python side.

I can take care of the changes on the python side if you'd like me to!

If you want to get the changes in more quickly, we can also break this PR into smaller ones (e.g., just tackling the typescript side first.)

libs/sandbox-py/langchain_sandbox/pyodide.py Outdated Show resolved Hide resolved
libs/sandbox-py/langchain_sandbox/pyodide.py Outdated Show resolved Hide resolved
libs/pyodide-sandbox-js/main.ts Outdated Show resolved Hide resolved
libs/sandbox-py/langchain_sandbox/pyodide.py Outdated Show resolved Hide resolved
@fullzer4
Copy link
Author

fullzer4 commented Jun 5, 2025

Left a few questions and I'd like to remove the mutability and and various help methods on the Python side.

I can take care of the changes on the python side if you'd like me to!

If you want to get the changes in more quickly, we can also break this PR into smaller ones (e.g., just tackling the typescript side first.)

Thanks for offering to help! I'm happy to handle the remaining changes on the Python side myself - there's no rush on my end to get these changes in immediately. I've already addressed some of your review points with recent commits (specifically the unnecessary filesystem operations and permission model reversion).

If you prefer breaking this into smaller PRs as you suggested, I'm open to that approach too, but I don't want you to worry about making the changes yourself unless you specifically want to. I'm comfortable implementing the remaining feedback items based on your guidance.

@fullzer4
Copy link
Author

fullzer4 commented Jun 9, 2025

Hi @eyurtsev,

Just checking in on the recent suggestions regarding the PR

  • Regarding the helper methods that add mutability
  • About using Pydantic’s model_post_init

I left a comment marked "Left a few questions..." and would appreciate it if you could review the points I raised when you have a chance. No rush—just a reminder so the PR doesn't get stale.

@rayshen92
Copy link

Hi @fullzer4,

Thank you very much for submitting this PR. I tried running the example locally but ran into the following error:

Error during execution:
Traceback (most recent call last):
  File "/lib/python312.zip/_pyodide/_base.py", line 597, in eval_code_async
    await CodeRunner(
  File "/lib/python312.zip/_pyodide/_base.py", line 411, in run_async
    coroutine = eval(self.code, globals, locals)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<exec>", line 1, in <module>
FileNotFoundError: [Errno 44] No such file or directory: 'input.txt'

Here’s the code I used:

files = {
    "input.txt": "Hello world!"
}

# Define sandbox tool with filesystem support
sandbox_tool = PyodideSandboxTool(
    files=files,
    allow_net=True,
)

# Create an agent with the sandbox tool
agent = create_react_agent("anthropic:claude-3-7-sonnet-latest", [sandbox_tool])

query = "display the content of input.txt"

async def run_agent(query: str):
    async for chunk in agent.astream({"messages": query}):
        print(chunk)

if __name__ == "__main__":
    asyncio.run(run_agent(query))

Could you please advise how to resolve this? Did I miss a configuration step, or is there something I need to adjust in the sandbox setup?

Thank you for your help!

@rayshen92
Copy link

Hi @fullzer4,

Thank you very much for submitting this PR. I tried running the example locally but ran into the following error:

Error during execution:
Traceback (most recent call last):
  File "/lib/python312.zip/_pyodide/_base.py", line 597, in eval_code_async
    await CodeRunner(
  File "/lib/python312.zip/_pyodide/_base.py", line 411, in run_async
    coroutine = eval(self.code, globals, locals)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<exec>", line 1, in <module>
FileNotFoundError: [Errno 44] No such file or directory: 'input.txt'

Here’s the code I used:

files = {
    "input.txt": "Hello world!"
}

# Define sandbox tool with filesystem support
sandbox_tool = PyodideSandboxTool(
    files=files,
    allow_net=True,
)

# Create an agent with the sandbox tool
agent = create_react_agent("anthropic:claude-3-7-sonnet-latest", [sandbox_tool])

query = "display the content of input.txt"

async def run_agent(query: str):
    async for chunk in agent.astream({"messages": query}):
        print(chunk)

if __name__ == "__main__":
    asyncio.run(run_agent(query))

Could you please advise how to resolve this? Did I miss a configuration step, or is there something I need to adjust in the sandbox setup?

Thank you for your help!

PKG_NAME = "jsr:@langchain/pyodide-sandbox@0.0.4"
I change this to local path, it works!

@fullzer4
Copy link
Author

fullzer4 commented Jun 10, 2025

Hi @rayshen92,

The error occurred because the system was referencing an older version of the pyodide-sandbox-js component (e.g., "jsr:@langchain/pyodide-sandbox@0.0.4"). By patching the PKG_NAME to point to the local main.ts file, we ensure that the updated, local version is used to test both components together.

For example, in the tests we apply the following patch:

@pytest.fixture
def pyodide_package(monkeypatch: pytest.MonkeyPatch) -> None:
    """Patch PKG_NAME to point to a local deno typescript file."""
    if os.environ.get("RUN_INTEGRATION", "").lower() == "true":
        # Skip this test if running in integration mode
        return
    local_script = str(current_dir / "../../../pyodide-sandbox-js/main.ts")
    monkeypatch.setattr("langchain_sandbox.pyodide.PKG_NAME", local_script)

Without this patch, you would be using the old version. Alternatively, to work without changing the PKG_NAME in tests, a new version would need to be generated—for example:

PKG_NAME = "jsr:@langchain/pyodide-sandbox@0.0.7"  # new version example

This ensures that the correct version is used, which is why the error occurred when referencing the outdated component.

Hope this clears!

@fullzer4
Copy link
Author

Hi @eyurtsev,

Following up on my previous message about the PR feedback. I wanted to let you know that the output files functionality is now complete.

I'd like to wrap up this PR so I can focus on implementing other library features and get this version released. I'm still waiting for your feedback on those specific points I mentioned earlier (the helper methods for mutability and the Pydantic model_post_init approach).

Once we resolve those questions, I think we'll be in good shape to merge this and move forward with the next set of features.

Thanks for your time!

@fullzer4
Copy link
Author

fullzer4 commented Jul 4, 2025

Hi @eyurtsev,

Quick update: I revisited the code in production and confirmed your earlier point—the mutability helpers weren’t adding real value. I’ve now removed them, made the models fully immutable, and shifted all directory/file handling into the constructor.

Let me know if there’s anything else you’d like tweaked; otherwise I think we’re ready to merge.

Thanks again for the guidance!

@eyurtsev
Copy link
Collaborator

eyurtsev commented Jul 9, 2025

@fullzer4 awesome sorry for the delay in responding. I'll will like commandeer to take over the changes.

@brisacoder
Copy link

This would be a welcomed feature. Very recently I switched back to running LLM generated Python code in containers for the very reason that this sandbox cannot read and write from file system. Many of my Langgraph applications read CSV/XLS files, analyze data, save reports, amongst others that need FS access.

@fullzer4 fullzer4 closed this Aug 5, 2025
@fullzer4 fullzer4 reopened this Aug 5, 2025
@sreeram004
Copy link

Thanks @fullzer4 for this. This is a really important and must have feature to fully support the goal of the package - execution of LLM generated code safely..

@eyurtsev Do you think this can be prioritised?

Thanks again

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Allow attaching files Preserve newlines when printing inside the sandbox

5 participants

Morty Proxy This is a proxified and sanitized view of the page, visit original site.