Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Add Docker sandbox backend and GenericSandboxEnv environment#1490

Merged
hamishivi merged 16 commits intomainallenai/open-instruct:mainfrom
add-docker-sandbox-envallenai/open-instruct:add-docker-sandbox-envCopy head branch name to clipboard
Feb 24, 2026
Merged

Add Docker sandbox backend and GenericSandboxEnv environment#1490
hamishivi merged 16 commits intomainallenai/open-instruct:mainfrom
add-docker-sandbox-envallenai/open-instruct:add-docker-sandbox-envCopy head branch name to clipboard

Conversation

@hamishivi
Copy link
Copy Markdown
Collaborator

@hamishivi hamishivi commented Feb 22, 2026

Summary

Adds Docker sandbox infrastructure for RL training with code execution, adapted from PR #1453:

backends.pySandboxBackend ABC + DockerBackend:

  • Configurable image, command timeout, and memory limit (mem_limit)
  • put_archive / get_archive for robust file I/O (handles large files, binary data)
  • Command output truncated to 1MB to prevent OOM
  • remove=True containers with graceful stop, kill-on-failure fallback
  • run_code validates language (currently Python only)
  • read_file supports binary mode, raises FileNotFoundError / IsADirectoryError

generic_sandbox.pyGenericSandboxEnv:

  • execute_bash — stateful bash (env vars and cwd persist between calls via wrapper script)
  • str_replace_editor — file viewer/editor (view/create/str_replace/insert) with correct line numbering in view ranges
  • coerce_args for type-safe tool arguments from model output
  • Configurable error penalty (default -0.05)
  • Reset with 3-attempt retry, fresh backend per reset
  • Git init with identity config (only if git available in image)
  • Catches FileNotFoundError in editor to return model-friendly errors
  • Immutable tool definitions tuple
  • GenericSandboxEnvConfig with backend, image, mem_limit, penalty, write_prompt_file, timeout

Integration:

  • Registered as generic_sandbox in TOOL_REGISTRY for auto-discovery from datasets
  • Exported from open_instruct.environments package
  • Added docker>=7.0.0 dependency

Debug:

  • scripts/train/debug/envs/sandbox_lm_1gpu.sh — 1-GPU training script
  • scripts/train/debug/envs/sandbox_lm_system_prompt.txt — system prompt for sandbox tasks

Test plan

  • Existing environment tests pass (pytest tests/test_environments.py)
  • All imports resolve correctly
  • TOOL_REGISTRY includes generic_sandbox
  • Linter passes (make style && make quality)
  • Test sandbox_lm_1gpu.sh on a GPU machine with Docker

Working 8-gpu beaker script: https://beaker.org/orgs/ai2/workspaces/open-instruct-dev/work/01KJ4EKS3XC153FKTTJEN132PM?taskId=01KJ4EKS43ZGDVZMX0MXJ7F7X7&jobId=01KJ4EKS7WWBS76ZGC7C447GNB

@hamishivi hamishivi force-pushed the add-docker-sandbox-env branch from b145ea6 to 1506151 Compare February 22, 2026 20:21
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @hamishivi, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the reinforcement learning capabilities by integrating robust code execution environments. It introduces a Docker-based sandbox backend, allowing for secure and controlled execution of code and commands during RL training. Two new environments, SandboxEnv and SandboxLMEnv, provide varying levels of interaction, from basic script execution to a full-featured, stateful coding environment, enabling models to interact with a simulated system for complex problem-solving. This foundational work expands the types of tasks that can be tackled within the RL framework, particularly those requiring dynamic code interaction.

Highlights

  • New Sandbox Infrastructure: Introduced a SandboxBackend abstract base class and its DockerBackend implementation, enabling code execution within Docker containers with features like command timeouts, output truncation, and memory limits.
  • New RL Environments: Added SandboxEnv for basic code execution tasks with execute, run_code, and submit tools, and SandboxLMEnv which mirrors the llm-in-sandbox interface, offering stateful execute_bash and a str_replace_editor for advanced file manipulation.
  • Tool Integration and Configuration: Both new sandbox environments are registered in TOOL_REGISTRY for auto-discovery from datasets, and their configurations (SandboxEnvConfig, SandboxLMEnvConfig) are integrated into the environment system.
  • Dependency and Debugging Support: The docker Python SDK (version >= 7.0.0) has been added as a dependency, and a 1-GPU debug script (sandbox_lm_1gpu.sh) with a specific system prompt has been included for testing the SandboxLMEnv.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • CHANGELOG.md
    • Added Docker sandbox backend and SandboxEnv/SandboxLMEnv environments for code execution during RL training. Includes DockerBackend with command timeout and output truncation, SandboxEnv (basic execute/run_code/submit), SandboxLMEnv (execute_bash + str_replace_editor mirroring llm-in-sandbox), and a 1-GPU debug script.
Activity
  • The pull request was created by hamishivi, introducing new sandbox environments and a Docker backend for RL training.
  • The author provided a detailed summary of the changes, including the new backend, environments, and a debug script.
  • A test plan was outlined, covering existing environment tests, import resolution, TOOL_REGISTRY inclusion, linter checks, and a pending test for the sandbox_lm_1gpu.sh script on a GPU machine with Docker.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@hamishivi hamishivi force-pushed the add-docker-sandbox-env branch from 1506151 to 41cc028 Compare February 22, 2026 20:25
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request introduces a robust Docker-based sandbox infrastructure for RL training, including two new environments (SandboxEnv and SandboxLMEnv). The implementation is well-structured, leveraging an abstract base class for backends and providing stateful bash execution. I have identified a few areas for improvement, primarily around path safety in shell commands, hardcoded resource limits, and minor path manipulation logic.

I am having trouble creating individual review comments. Click here to see my feedback.

open_instruct/environments/backends.py (128)

high

The path variable is injected directly into a shell command without quoting. This will fail if the path contains spaces or special characters. Using shlex.quote ensures the path is safely handled by the shell.

        self._container.exec_run(["bash", "-c", f"echo '{encoded_content}' | base64 -d > {shlex.quote(path)}"])

open_instruct/environments/backends.py (69)

medium

The memory limits for the Docker container are currently hardcoded to 4GB. It would be better to make these configurable via the constructor to allow for different task requirements.

    def __init__(self, image: str = "ubuntu:24.04", timeout: int = 1800, mem_limit: str = "4g"):
        """
        Args:
            image: Docker image to use (default: ubuntu:24.04)
            timeout: Per-command timeout in seconds (default: 1800 / 30 min)
            mem_limit: Memory limit for the container (default: 4g)
        """
        self._image = image
        self._timeout = timeout
        self._mem_limit = mem_limit
        self._container = None
        self._client = None

open_instruct/environments/backends.py (88)

medium

Using the configurable memory limit instead of a hardcoded value.

        self._container = self._client.containers.run(
            self._image, command="sleep infinity", detach=True, remove=True, mem_limit=self._mem_limit, memswap_limit=self._mem_limit
        )

open_instruct/environments/backends.py (134)

medium

When executing cat, it's safer to use -- to signal the end of command options. This prevents cat from interpreting a filename starting with a hyphen as an option.

        exit_code, output = self._container.exec_run(["cat", "--", path])

open_instruct/environments/sandbox_lm.py (271)

medium

Manual path manipulation using string splitting can be brittle. Since the sandbox is Linux-based, using posixpath.dirname is a more robust and standard way to extract the directory name.

        import posixpath
        parent = posixpath.dirname(path)

@hamishivi hamishivi force-pushed the add-docker-sandbox-env branch from 41cc028 to 882d68b Compare February 22, 2026 20:35
@hamishivi
Copy link
Copy Markdown
Collaborator Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a Docker-based sandbox for code execution in RL environments, adding a SandboxBackend abstraction, a DockerBackend implementation, and a GenericSandboxEnv that provides execute_bash and file editing tools. The changes are well-structured and the implementation is solid. I've provided a few suggestions to improve the robustness of the DockerBackend implementation, particularly around file I/O, and a minor correction to a system prompt file.

Comment thread open_instruct/environments/backends.py Outdated
Comment thread open_instruct/environments/backends.py Outdated
@hamishivi hamishivi force-pushed the add-docker-sandbox-env branch 2 times, most recently from 9ed0a7d to 552afa4 Compare February 22, 2026 20:48
@hamishivi
Copy link
Copy Markdown
Collaborator Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request introduces a Docker sandbox backend and two new environments, SandboxEnv and SandboxLMEnv, for RL training with code execution. It also updates the CHANGELOG.md, open_instruct/environments/__init__.py, open_instruct/environments/tools/tools.py, and pyproject.toml files to integrate these new features and dependencies. A new debug script sandbox_lm_1gpu.sh and its corresponding system prompt file sandbox_lm_system_prompt.txt are added. The changes enhance the project's capability for code-aware RL agents by providing a sandboxed execution environment.

Comment thread open_instruct/environments/backends.py Outdated
Comment thread open_instruct/environments/backends.py Outdated
Comment thread open_instruct/environments/__init__.py
Comment thread open_instruct/environments/generic_sandbox.py Outdated
Comment thread open_instruct/environments/generic_sandbox.py Outdated
Comment thread scripts/train/debug/envs/sandbox_lm_system_prompt.txt Outdated
@hamishivi hamishivi force-pushed the add-docker-sandbox-env branch 5 times, most recently from 77c6a8d to de5f289 Compare February 22, 2026 20:58
@hamishivi
Copy link
Copy Markdown
Collaborator Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a Docker sandbox backend and two new environments, SandboxEnv and SandboxLMEnv, for RL training with code execution. The changes include new files for backend implementation (backends.py) and the sandbox environments (generic_sandbox.py), updates to __init__.py and tools.py to register these new components, and a new debug script (sandbox_lm_1gpu.sh) along with its system prompt. The pyproject.toml and uv.lock files have also been updated to include the docker dependency. The code generally looks good, with clear abstractions for the sandbox backend and well-defined tool interfaces. I've identified a few areas for improvement regarding error handling, logging, and consistency in the CHANGELOG.md entry.

Comment thread open_instruct/environments/backends.py Outdated
Comment thread CHANGELOG.md Outdated
Comment thread open_instruct/environments/backends.py Outdated
Comment thread open_instruct/environments/backends.py Outdated
Comment thread open_instruct/environments/generic_sandbox.py Outdated
Comment thread open_instruct/environments/generic_sandbox.py
Comment thread open_instruct/environments/generic_sandbox.py Outdated
Comment thread open_instruct/environments/generic_sandbox.py Outdated
Comment thread open_instruct/environments/generic_sandbox.py Outdated
@hamishivi hamishivi changed the title Add Docker sandbox backend and SandboxEnv/SandboxLMEnv environments Add Docker sandbox backend and GenericSandboxEnv environment Feb 22, 2026
@hamishivi hamishivi force-pushed the add-docker-sandbox-env branch 8 times, most recently from b86560d to 23371fb Compare February 22, 2026 21:04
@hamishivi
Copy link
Copy Markdown
Collaborator Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a robust Docker-based sandbox infrastructure for RL training, featuring a stateful bash environment and a file editor tool. The implementation of state persistence in Docker containers via environment variable and CWD tracking is clever. However, there are a few issues regarding command defaults, potential hangs in file reading, and image-specific dependencies that should be addressed to ensure reliability across different Docker images.

Comment thread open_instruct/environments/backends.py Outdated
Comment thread open_instruct/environments/backends.py Outdated
Comment thread open_instruct/environments/backends.py Outdated
Comment thread open_instruct/environments/generic_sandbox.py Outdated
Comment thread open_instruct/environments/generic_sandbox.py Outdated
@hamishivi hamishivi force-pushed the add-docker-sandbox-env branch from 23371fb to 67caf7a Compare February 22, 2026 21:10
@hamishivi hamishivi force-pushed the add-docker-sandbox-env branch 6 times, most recently from 1e537d8 to ebf4cb1 Compare February 23, 2026 00:09
Adds sandbox infrastructure for RL training with code execution:

- backends.py: SandboxBackend ABC + DockerBackend (command timeout,
  output truncation, 4GB memory limit, put_archive file transfer)
- generic_sandbox.py: GenericSandboxEnv with execute_bash +
  str_replace_editor (stateful bash, file editing, retry on reset)
- Register as 'generic_sandbox' in TOOL_REGISTRY for auto-discovery
- Add docker>=7.0.0 dependency
- Add 1-GPU debug script and system prompt

Co-authored-by: Cursor <cursoragent@cursor.com>
@hamishivi hamishivi force-pushed the add-docker-sandbox-env branch from ebf4cb1 to 6072c71 Compare February 23, 2026 00:10
root and others added 8 commits February 23, 2026 00:22
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Without env_name, GenericSandboxEnv tools never get dispatched because
the rollout loop never acquires the env or registers its inner tool
names. This finds the first stateful env (not a simple Tool) and sets
it as the default so samples without an explicit env_config column get
dispatched correctly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When using --tools generic_sandbox, the pool is keyed by "generic_sandbox"
but the model calls "execute_bash" and "str_replace_editor". These inner
tool names were missing from tool_call_names, causing validate_dataset_tools
to reject datasets listing them and EnvStatistics to miss their metrics.

Fix: collect tool definitions from CLI pools first, build known_tool_names
set, and include inner function names in tools_config.tool_call_names.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Mounts /var/run/docker.sock from the host so GenericSandboxEnv can
create Docker containers inside Beaker jobs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
For stateful environments like GenericSandboxEnv, the pool key
(e.g., "generic_sandbox") was included alongside the actual tool
names ("execute_bash", "str_replace_editor") in tool_call_names.
This caused the env name to appear in metrics and validation.

Now only include tool definition names (what the model actually
calls), not pool/env keys.

Co-authored-by: Cursor <cursoragent@cursor.com>
With 16 unique prompts x 4 samples = 64 rollouts, pool_size=8 was
a bottleneck causing long acquire waits. 16 gives one sandbox per
unique prompt (64GB containers on Jupiter nodes).

Co-authored-by: Cursor <cursoragent@cursor.com>
hamishivi and others added 3 commits February 22, 2026 21:54
Co-authored-by: Cursor <cursoragent@cursor.com>
run_code (write temp file + execute) is the same for all backends.
Moved it to SandboxBackend as a concrete method so backends only
need to implement run_command. Removed the DockerBackend override.

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Copy link
Copy Markdown
Collaborator

@natolambert natolambert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems fine -- one question is can we make docker an optional dependency? Not sure how main path you see this to be, especially given all the try-except on importing docker. If its required, we dont need try excepts?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where'd we get this? If we took it from somewhere can we credit?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yeah let me add a comment to the script. Is modified from https://github.com/llm-in-sandbox/llm-in-sandbox

hamishivi and others added 3 commits February 23, 2026 18:48
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
@hamishivi
Copy link
Copy Markdown
Collaborator Author

For now, I'll keep docker as a main dep, and remove the import guards. If it becomes onerous in the future we can change it fairly easily.

Co-authored-by: Cursor <cursoragent@cursor.com>
@hamishivi hamishivi added this pull request to the merge queue Feb 24, 2026
Merged via the queue into main with commit c67b3b5 Feb 24, 2026
7 checks passed
@hamishivi hamishivi deleted the add-docker-sandbox-env branch February 24, 2026 03:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Morty Proxy This is a proxified and sanitized view of the page, visit original site.