Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Add rollout saving to GRPO training#1406

Merged
finbarrtimbers merged 16 commits intomainallenai/open-instruct:mainfrom
finbarr/save-rolloutsallenai/open-instruct:finbarr/save-rolloutsCopy head branch name to clipboard
Jan 23, 2026
Merged

Add rollout saving to GRPO training#1406
finbarrtimbers merged 16 commits intomainallenai/open-instruct:mainfrom
finbarr/save-rolloutsallenai/open-instruct:finbarr/save-rolloutsCopy head branch name to clipboard

Conversation

@finbarrtimbers
Copy link
Copy Markdown
Collaborator

@finbarrtimbers finbarrtimbers commented Jan 21, 2026

Implement the save_traces flag to save all generated RL rollouts to disk during GRPO training. For each generation, saves prompt_tokens, response_tokens, advantage, reward, step, sample_idx, prompt_idx, dataset, finish_reason, ground_truth, and tool_info.

  • Add rollouts_save_path arg with default /weka/oe-adapt-default/allennlp/deletable_rollouts/
  • Save metadata file with git commit, model name, and timestamp
  • Save rollouts to sharded JSONL files (new file every 10k samples)
  • Use async saving via ThreadPoolExecutor to not block training
  • Add RolloutMetadata and RolloutRecord dataclasses for type safety
  • Add GPU integration test validating rollout file creation and field presence

GPU_TESTS=01KFGWBPSPES7P8X2BP3HGEYGC

@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @finbarrtimbers, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the GRPO training process by implementing a robust system for saving reinforcement learning rollouts to disk. This feature allows researchers and developers to meticulously analyze the generated data, including detailed metrics and contextual information for each sample, which is crucial for debugging, performance evaluation, and further research. The asynchronous saving mechanism ensures that this data persistence does not impede the training speed, maintaining efficiency while providing valuable insights.

Highlights

  • Rollout Saving Implementation: Introduced functionality to save all generated RL rollouts to disk during GRPO training.
  • Detailed Data Capture: Each saved rollout includes comprehensive information such as prompt and response tokens, advantage, reward, step, sample/prompt indices, dataset, finish reason, ground truth, and tool interaction details.
  • Configurable Save Path: Added a rollouts_save_path argument with a default directory for storing the rollout data.
  • Metadata Generation: A metadata file is now saved, containing the git commit hash, model name, and a timestamp for better traceability of training runs.
  • Sharded JSONL Format: Rollouts are saved into sharded JSONL files, with a new file created every 10,000 samples to manage file size and improve I/O performance.
  • Asynchronous Saving: Utilizes a ThreadPoolExecutor for asynchronous saving of rollouts, ensuring that the training process is not blocked by disk I/O operations.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request introduces functionality to save RL rollouts to disk during GRPO training, including metadata and sharded JSONL files, with asynchronous saving using a ThreadPoolExecutor. The changes are well-structured and address the stated objective. However, there are a few areas for improvement related to error handling, path configuration, and constant management.

Comment thread open_instruct/data_loader.py Outdated
Comment thread open_instruct/data_loader.py Outdated
Comment thread open_instruct/data_loader.py
Comment thread open_instruct/grpo_fast.py Outdated
finbarrtimbers and others added 6 commits January 21, 2026 11:24
Implement the save_traces flag to save all generated RL rollouts to disk
during GRPO training. For each generation, saves prompt_tokens,
response_tokens, advantage, reward, step, sample_idx, prompt_idx,
dataset, finish_reason, ground_truth, and tool_info.

- Add rollouts_save_path arg with default /weka/oe-adapt-default/allennlp/deletable_rollouts/
- Save metadata file with git commit, model name, and timestamp
- Save rollouts to sharded JSONL files (new file every 10k samples)
- Use async saving via ThreadPoolExecutor to not block training

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Refactored rollout saving to use RolloutMetadata and RolloutRecord
dataclasses for type safety. Added GPU integration test to validate
rollout file creation and field presence using dataclasses.fields().

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Raises ValueError if save_traces is True but rollouts_save_path is empty.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add get_git_commit() to utils.py that reads from environment variable.
Update data_loader.py and benchmark_generators.py to use the centralized function.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The field was missing after the Args class was moved to grpo_utils.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
finbarrtimbers and others added 6 commits January 21, 2026 11:27
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
The StreamingDataLoaderConfig class uses dataset_mixer_list, not
dataset_name. The tests only need save_traces and rollouts_save_path.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@hamishivi hamishivi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally seems fine!

Comment thread open_instruct/rl_utils.py Outdated
Comment thread open_instruct/rl_utils.py
@finbarrtimbers finbarrtimbers added this pull request to the merge queue Jan 23, 2026
Merged via the queue into main with commit d3db709 Jan 23, 2026
6 of 7 checks passed
@finbarrtimbers finbarrtimbers deleted the finbarr/save-rollouts branch January 23, 2026 23:56
sang1583535 pushed a commit to sang1583535/open-instruct that referenced this pull request Feb 3, 2026
* Add rollout saving to GRPO training

Implement the save_traces flag to save all generated RL rollouts to disk
during GRPO training. For each generation, saves prompt_tokens,
response_tokens, advantage, reward, step, sample_idx, prompt_idx,
dataset, finish_reason, ground_truth, and tool_info.

- Add rollouts_save_path arg with default /weka/oe-adapt-default/allennlp/deletable_rollouts/
- Save metadata file with git commit, model name, and timestamp
- Save rollouts to sharded JSONL files (new file every 10k samples)
- Use async saving via ThreadPoolExecutor to not block training

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* Add dataclasses for rollout saving and GPU test

Refactored rollout saving to use RolloutMetadata and RolloutRecord
dataclasses for type safety. Added GPU integration test to validate
rollout file creation and field presence using dataclasses.fields().

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* Add validation for save_traces configuration

Raises ValueError if save_traces is True but rollouts_save_path is empty.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* Consolidate git commit retrieval to use GIT_COMMIT env var

Add get_git_commit() to utils.py that reads from environment variable.
Update data_loader.py and benchmark_generators.py to use the centralized function.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* Added command to run GPU tests

* Add rollouts_save_path to ExperimentConfig

The field was missing after the Args class was moved to grpo_utils.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* Updated changelog.

* Update open_instruct/data_loader.py

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* Updated code.

* Fix failing tests by removing invalid dataset_name parameter

The StreamingDataLoaderConfig class uses dataset_mixer_list, not
dataset_name. The tests only need save_traces and rollouts_save_path.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* Added rl_utils test on GPU

* Added logprobs field

* restored comments

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
lukashelff pushed a commit to lukashelff/open-instruct-slurm that referenced this pull request Feb 19, 2026
* Add rollout saving to GRPO training

Implement the save_traces flag to save all generated RL rollouts to disk
during GRPO training. For each generation, saves prompt_tokens,
response_tokens, advantage, reward, step, sample_idx, prompt_idx,
dataset, finish_reason, ground_truth, and tool_info.

- Add rollouts_save_path arg with default /weka/oe-adapt-default/allennlp/deletable_rollouts/
- Save metadata file with git commit, model name, and timestamp
- Save rollouts to sharded JSONL files (new file every 10k samples)
- Use async saving via ThreadPoolExecutor to not block training

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* Add dataclasses for rollout saving and GPU test

Refactored rollout saving to use RolloutMetadata and RolloutRecord
dataclasses for type safety. Added GPU integration test to validate
rollout file creation and field presence using dataclasses.fields().

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* Add validation for save_traces configuration

Raises ValueError if save_traces is True but rollouts_save_path is empty.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* Consolidate git commit retrieval to use GIT_COMMIT env var

Add get_git_commit() to utils.py that reads from environment variable.
Update data_loader.py and benchmark_generators.py to use the centralized function.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* Added command to run GPU tests

* Add rollouts_save_path to ExperimentConfig

The field was missing after the Args class was moved to grpo_utils.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* Updated changelog.

* Update open_instruct/data_loader.py

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* Updated code.

* Fix failing tests by removing invalid dataset_name parameter

The StreamingDataLoaderConfig class uses dataset_mixer_list, not
dataset_name. The tests only need save_traces and rollouts_save_path.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* Added rl_utils test on GPU

* Added logprobs field

* restored comments

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Morty Proxy This is a proxified and sanitized view of the page, visit original site.