Add rollout saving to GRPO training by finbarrtimbers · Pull Request #1406 · allenai/open-instruct

finbarrtimbers · Jan 21, 2026

Implement the save_traces flag to save all generated RL rollouts to disk during GRPO training. For each generation, saves prompt_tokens, response_tokens, advantage, reward, step, sample_idx, prompt_idx, dataset, finish_reason, ground_truth, and tool_info.

Add rollouts_save_path arg with default /weka/oe-adapt-default/allennlp/deletable_rollouts/
Save metadata file with git commit, model name, and timestamp
Save rollouts to sharded JSONL files (new file every 10k samples)
Use async saving via ThreadPoolExecutor to not block training
Add RolloutMetadata and RolloutRecord dataclasses for type safety
Add GPU integration test validating rollout file creation and field presence

GPU_TESTS=01KFGWBPSPES7P8X2BP3HGEYGC

chatgpt-codex-connector · Jan 21, 2026

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

gemini-code-assist · Jan 21, 2026

Summary of Changes

Hello @finbarrtimbers, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the GRPO training process by implementing a robust system for saving reinforcement learning rollouts to disk. This feature allows researchers and developers to meticulously analyze the generated data, including detailed metrics and contextual information for each sample, which is crucial for debugging, performance evaluation, and further research. The asynchronous saving mechanism ensures that this data persistence does not impede the training speed, maintaining efficiency while providing valuable insights.

Highlights

Rollout Saving Implementation: Introduced functionality to save all generated RL rollouts to disk during GRPO training.
Detailed Data Capture: Each saved rollout includes comprehensive information such as prompt and response tokens, advantage, reward, step, sample/prompt indices, dataset, finish reason, ground truth, and tool interaction details.
Configurable Save Path: Added a rollouts_save_path argument with a default directory for storing the rollout data.
Metadata Generation: A metadata file is now saved, containing the git commit hash, model name, and a timestamp for better traceability of training runs.
Sharded JSONL Format: Rollouts are saved into sharded JSONL files, with a new file created every 10,000 samples to manage file size and improve I/O performance.
Asynchronous Saving: Utilizes a ThreadPoolExecutor for asynchronous saving of rollouts, ensuring that the training process is not blocked by disk I/O operations.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

The pull request introduces functionality to save RL rollouts to disk during GRPO training, including metadata and sharded JSONL files, with asynchronous saving using a ThreadPoolExecutor. The changes are well-structured and address the stated objective. However, there are a few areas for improvement related to error handling, path configuration, and constant management.

Implement the save_traces flag to save all generated RL rollouts to disk during GRPO training. For each generation, saves prompt_tokens, response_tokens, advantage, reward, step, sample_idx, prompt_idx, dataset, finish_reason, ground_truth, and tool_info. - Add rollouts_save_path arg with default /weka/oe-adapt-default/allennlp/deletable_rollouts/ - Save metadata file with git commit, model name, and timestamp - Save rollouts to sharded JSONL files (new file every 10k samples) - Use async saving via ThreadPoolExecutor to not block training Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Refactored rollout saving to use RolloutMetadata and RolloutRecord dataclasses for type safety. Added GPU integration test to validate rollout file creation and field presence using dataclasses.fields(). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Raises ValueError if save_traces is True but rollouts_save_path is empty. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add get_git_commit() to utils.py that reads from environment variable. Update data_loader.py and benchmark_generators.py to use the centralized function. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

The field was missing after the Args class was moved to grpo_utils. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

The StreamingDataLoaderConfig class uses dataset_mixer_list, not dataset_name. The tests only need save_traces and rollouts_save_path. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

hamishivi

Generally seems fine!

* Add rollout saving to GRPO training Implement the save_traces flag to save all generated RL rollouts to disk during GRPO training. For each generation, saves prompt_tokens, response_tokens, advantage, reward, step, sample_idx, prompt_idx, dataset, finish_reason, ground_truth, and tool_info. - Add rollouts_save_path arg with default /weka/oe-adapt-default/allennlp/deletable_rollouts/ - Save metadata file with git commit, model name, and timestamp - Save rollouts to sharded JSONL files (new file every 10k samples) - Use async saving via ThreadPoolExecutor to not block training Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Add dataclasses for rollout saving and GPU test Refactored rollout saving to use RolloutMetadata and RolloutRecord dataclasses for type safety. Added GPU integration test to validate rollout file creation and field presence using dataclasses.fields(). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Add validation for save_traces configuration Raises ValueError if save_traces is True but rollouts_save_path is empty. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Consolidate git commit retrieval to use GIT_COMMIT env var Add get_git_commit() to utils.py that reads from environment variable. Update data_loader.py and benchmark_generators.py to use the centralized function. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Added command to run GPU tests * Add rollouts_save_path to ExperimentConfig The field was missing after the Args class was moved to grpo_utils. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Updated changelog. * Update open_instruct/data_loader.py Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Updated code. * Fix failing tests by removing invalid dataset_name parameter The StreamingDataLoaderConfig class uses dataset_mixer_list, not dataset_name. The tests only need save_traces and rollouts_save_path. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Added rl_utils test on GPU * Added logprobs field * restored comments --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

gemini-code-assist Bot reviewed Jan 21, 2026

View reviewed changes

Comment thread open_instruct/data_loader.py Outdated

Comment thread open_instruct/data_loader.py Outdated

Comment thread open_instruct/data_loader.py

Comment thread open_instruct/grpo_fast.py Outdated

finbarrtimbers and others added 6 commits January 21, 2026 11:24

Add validation for save_traces configuration

f1d322a

Raises ValueError if save_traces is True but rollouts_save_path is empty. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Consolidate git commit retrieval to use GIT_COMMIT env var

dbcfec5

Add get_git_commit() to utils.py that reads from environment variable. Update data_loader.py and benchmark_generators.py to use the centralized function. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Added command to run GPU tests

1362638

Add rollouts_save_path to ExperimentConfig

e4fda5a

The field was missing after the Args class was moved to grpo_utils. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

finbarrtimbers force-pushed the finbarr/save-rollouts branch from 32e6e76 to e4fda5a Compare January 21, 2026 18:25

finbarrtimbers and others added 6 commits January 21, 2026 11:27

Merge branch 'main' into finbarr/save-rollouts

0925886

Updated changelog.

c8c91be

Merge branch 'main' into finbarr/save-rollouts

f6dec46

Update open_instruct/data_loader.py

cf552d3

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Updated code.

2de9204

Fix failing tests by removing invalid dataset_name parameter

4f0b6b5

The StreamingDataLoaderConfig class uses dataset_mixer_list, not dataset_name. The tests only need save_traces and rollouts_save_path. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

finbarrtimbers requested a review from hamishivi January 22, 2026 20:55

finbarrtimbers enabled auto-merge January 22, 2026 20:55

hamishivi reviewed Jan 22, 2026

View reviewed changes

Comment thread open_instruct/rl_utils.py Outdated

Comment thread open_instruct/rl_utils.py

finbarrtimbers added 4 commits January 23, 2026 14:47

Merge branch 'main' into finbarr/save-rollouts

87990a1

Added rl_utils test on GPU

f4c7c38

Added logprobs field

448e242

restored comments

5789f4e

finbarrtimbers requested a review from hamishivi January 23, 2026 22:24

hamishivi approved these changes Jan 23, 2026

View reviewed changes

finbarrtimbers added this pull request to the merge queue Jan 23, 2026

Merged via the queue into main with commit d3db709 Jan 23, 2026
6 of 7 checks passed

finbarrtimbers deleted the finbarr/save-rollouts branch January 23, 2026 23:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add rollout saving to GRPO training#1406

Add rollout saving to GRPO training#1406
finbarrtimbers merged 16 commits intomainallenai/open-instruct:mainfrom
finbarr/save-rolloutsallenai/open-instruct:finbarr/save-rolloutsCopy head branch name to clipboard

finbarrtimbers commented Jan 21, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot commented Jan 21, 2026

Uh oh!

gemini-code-assist Bot commented Jan 21, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hamishivi left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Search code, repositories, users, issues, pull requests...

Conversation

finbarrtimbers commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chatgpt-codex-connector Bot commented Jan 21, 2026

Uh oh!

gemini-code-assist Bot commented Jan 21, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hamishivi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

finbarrtimbers commented Jan 21, 2026 •

edited

Loading