-
Notifications
You must be signed in to change notification settings - Fork 15
Add File System Support via pyodide.FS #37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…lzer4/add_fyle_system
Fullzer4/add fyle system
fix: conventions
fix: _build_command place convention
This comment was marked as outdated.
This comment was marked as outdated.
Update: Feature Complete & Ready for ReviewQuick Summary
StructuredTool MigrationThe switch to StructuredTool enables dynamic descriptions that automatically list available files, improving developer experience by providing clear visibility into attached resources. Technical DetailsThis implementation focuses only on MEMFS support as documented in the Pyodide File System documentation. Ready for review! 🚀 Open to feedback and suggestions for improvements. @eyurtsev @vbarda Could you please review when you have a chance? Thank you! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fullzer4 this PR is really amazing. Thank you for putting all of this together! I left a few questions about the extra entrypoints in the code and would like to figure out how to avoid them if possible.
Thanks also for fixing the various issues in the existing code! You caught more than one bug!
I think we have to pass the file bytes via stdin rather than as arg as most OSs impose a max limit on the size of the arg value that can be passed |
Fix/conventions and patterns
@eyurtsev I've completed the filesystem implementation with stdin. It's working well and solves the CLI argument size limitations we discussed. I chose a binary protocol over JSON stdin for file transfer, implemented with ReadableStream for efficient binary data handling, because it preserves binary data integrity and avoids "Maximum call stack size exceeded" errors that occur when sending large files as JSON through stdin. Would appreciate your thoughts on this approach. I'm already working on the output functionality with a get_file method to extract files from the sandbox after execution. Will submit this in a separate PR once it's ready and tested. Regarding the permissions flags, I had to make changes to fix critical errors like: Error: Requires write access to "/home/fullzer4/path/node_modules/.deno/pyodide@0.27.5/node_modules/pyodide/micropip-0.8.0-py3-none-any.whl" and Code NotCapable: Requires read access to "/path/node_modules/.deno/pyodide@0.27.5/node_modules/pyodide/pyodide.asm.wasm" The approach follows the principle of least privilege while ensuring everything works properly. For the description part, I implemented a template-based system with _description_template and _build_description() that dynamically shows available files to the LLM: A secure Python code sandbox with filesystem support... ATTACHED FILES AVAILABLE: • data.csv • config.json These files are already loaded and ready to use... The implementation now guarantees that when users provide custom descriptions, they are respected exactly as specified, while still supporting dynamic file information through an optional placeholder system. I kept the filesystem methods (attach_file, create_directory, etc.) as they provide a clean, intuitive API. These methods make working with sandbox files much simpler than reimplementing these operations in user code. I've also implemented the constructor-based file attachment you suggested, so users can now provide files directly when creating the sandbox tool: sandbox_tool = PyodideSandboxTool(
allow_net=True,
files={
"sales.csv": sales_data
}
) I decided to maintain both approaches - constructor-based initialization and method-based file operations - because they serve different use cases. The constructor approach is perfect for setting up files at creation time in a clean, declarative style, while the method-based approach allows for more dynamic file manipulation during the tool's lifecycle. This flexibility gives users the best of both worlds without forcing them into a single pattern. Let me know if you'd like me to adjust anything before merging. |
Hi @eyurtsev, I hope you're doing well! I just wanted to gently check in on the filesystem support PR when you have a moment. I'm eager to hear your thoughts. Thank you so much! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few questions and I'd like to remove the mutability and and various help methods on the Python side.
I can take care of the changes on the python side if you'd like me to!
If you want to get the changes in more quickly, we can also break this PR into smaller ones (e.g., just tackling the typescript side first.)
Thanks for offering to help! I'm happy to handle the remaining changes on the Python side myself - there's no rush on my end to get these changes in immediately. I've already addressed some of your review points with recent commits (specifically the unnecessary filesystem operations and permission model reversion). If you prefer breaking this into smaller PRs as you suggested, I'm open to that approach too, but I don't want you to worry about making the changes yourself unless you specifically want to. I'm comfortable implementing the remaining feedback items based on your guidance. |
Hi @eyurtsev, Just checking in on the recent suggestions regarding the PR
I left a comment marked "Left a few questions..." and would appreciate it if you could review the points I raised when you have a chance. No rush—just a reminder so the PR doesn't get stale. |
Hi @fullzer4, Thank you very much for submitting this PR. I tried running the example locally but ran into the following error:
Here’s the code I used: files = {
"input.txt": "Hello world!"
}
# Define sandbox tool with filesystem support
sandbox_tool = PyodideSandboxTool(
files=files,
allow_net=True,
)
# Create an agent with the sandbox tool
agent = create_react_agent("anthropic:claude-3-7-sonnet-latest", [sandbox_tool])
query = "display the content of input.txt"
async def run_agent(query: str):
async for chunk in agent.astream({"messages": query}):
print(chunk)
if __name__ == "__main__":
asyncio.run(run_agent(query)) Could you please advise how to resolve this? Did I miss a configuration step, or is there something I need to adjust in the sandbox setup? Thank you for your help! |
PKG_NAME = "jsr:@langchain/pyodide-sandbox@0.0.4" |
Hi @rayshen92, The error occurred because the system was referencing an older version of the pyodide-sandbox-js component (e.g., "jsr:@langchain/pyodide-sandbox@0.0.4"). By patching the PKG_NAME to point to the local main.ts file, we ensure that the updated, local version is used to test both components together. For example, in the tests we apply the following patch: @pytest.fixture
def pyodide_package(monkeypatch: pytest.MonkeyPatch) -> None:
"""Patch PKG_NAME to point to a local deno typescript file."""
if os.environ.get("RUN_INTEGRATION", "").lower() == "true":
# Skip this test if running in integration mode
return
local_script = str(current_dir / "../../../pyodide-sandbox-js/main.ts")
monkeypatch.setattr("langchain_sandbox.pyodide.PKG_NAME", local_script) Without this patch, you would be using the old version. Alternatively, to work without changing the PKG_NAME in tests, a new version would need to be generated—for example: PKG_NAME = "jsr:@langchain/pyodide-sandbox@0.0.7" # new version example This ensures that the correct version is used, which is why the error occurred when referencing the outdated component. Hope this clears! |
Hi @eyurtsev, Following up on my previous message about the PR feedback. I wanted to let you know that the output files functionality is now complete. I'd like to wrap up this PR so I can focus on implementing other library features and get this version released. I'm still waiting for your feedback on those specific points I mentioned earlier (the helper methods for mutability and the Pydantic model_post_init approach). Once we resolve those questions, I think we'll be in good shape to merge this and move forward with the next set of features. Thanks for your time! |
Hi @eyurtsev, Quick update: I revisited the code in production and confirmed your earlier point—the mutability helpers weren’t adding real value. I’ve now removed them, made the models fully immutable, and shifted all directory/file handling into the constructor. Let me know if there’s anything else you’d like tweaked; otherwise I think we’re ready to merge. Thanks again for the guidance! |
@fullzer4 awesome sorry for the delay in responding. I'll will like commandeer to take over the changes. |
This would be a welcomed feature. Very recently I switched back to running LLM generated Python code in containers for the very reason that this sandbox cannot read and write from file system. Many of my Langgraph applications read CSV/XLS files, analyze data, save reports, amongst others that need FS access. |
Add File System Support via pyodide.FS
This PR implements comprehensive file system operations in the PyodideSandbox, addressing issue #34 (Allow attaching files) and issue #26 (Preserve newlines when printing).
Key Features
File System Operations
attach_file()
andattach_files()
open()
,Path()
, and standard file operationsBug Fixes
print(5); print(5)
now correctly outputs5\n5
instead of55
Implementation Details
FileSystemOperation
interface for defining file operationsmain.ts
using pyodide.FSUsage Example
References