Add rollout saving to GRPO training#1406
Add rollout saving to GRPO training#1406finbarrtimbers merged 16 commits intomainallenai/open-instruct:mainfrom
Conversation
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
Summary of ChangesHello @finbarrtimbers, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request enhances the GRPO training process by implementing a robust system for saving reinforcement learning rollouts to disk. This feature allows researchers and developers to meticulously analyze the generated data, including detailed metrics and contextual information for each sample, which is crucial for debugging, performance evaluation, and further research. The asynchronous saving mechanism ensures that this data persistence does not impede the training speed, maintaining efficiency while providing valuable insights. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
The pull request introduces functionality to save RL rollouts to disk during GRPO training, including metadata and sharded JSONL files, with asynchronous saving using a ThreadPoolExecutor. The changes are well-structured and address the stated objective. However, there are a few areas for improvement related to error handling, path configuration, and constant management.
Implement the save_traces flag to save all generated RL rollouts to disk during GRPO training. For each generation, saves prompt_tokens, response_tokens, advantage, reward, step, sample_idx, prompt_idx, dataset, finish_reason, ground_truth, and tool_info. - Add rollouts_save_path arg with default /weka/oe-adapt-default/allennlp/deletable_rollouts/ - Save metadata file with git commit, model name, and timestamp - Save rollouts to sharded JSONL files (new file every 10k samples) - Use async saving via ThreadPoolExecutor to not block training Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Refactored rollout saving to use RolloutMetadata and RolloutRecord dataclasses for type safety. Added GPU integration test to validate rollout file creation and field presence using dataclasses.fields(). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Raises ValueError if save_traces is True but rollouts_save_path is empty. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add get_git_commit() to utils.py that reads from environment variable. Update data_loader.py and benchmark_generators.py to use the centralized function. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The field was missing after the Args class was moved to grpo_utils. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
32e6e76 to
e4fda5a
Compare
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
The StreamingDataLoaderConfig class uses dataset_mixer_list, not dataset_name. The tests only need save_traces and rollouts_save_path. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
hamishivi
left a comment
There was a problem hiding this comment.
Generally seems fine!
* Add rollout saving to GRPO training Implement the save_traces flag to save all generated RL rollouts to disk during GRPO training. For each generation, saves prompt_tokens, response_tokens, advantage, reward, step, sample_idx, prompt_idx, dataset, finish_reason, ground_truth, and tool_info. - Add rollouts_save_path arg with default /weka/oe-adapt-default/allennlp/deletable_rollouts/ - Save metadata file with git commit, model name, and timestamp - Save rollouts to sharded JSONL files (new file every 10k samples) - Use async saving via ThreadPoolExecutor to not block training Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Add dataclasses for rollout saving and GPU test Refactored rollout saving to use RolloutMetadata and RolloutRecord dataclasses for type safety. Added GPU integration test to validate rollout file creation and field presence using dataclasses.fields(). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Add validation for save_traces configuration Raises ValueError if save_traces is True but rollouts_save_path is empty. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Consolidate git commit retrieval to use GIT_COMMIT env var Add get_git_commit() to utils.py that reads from environment variable. Update data_loader.py and benchmark_generators.py to use the centralized function. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Added command to run GPU tests * Add rollouts_save_path to ExperimentConfig The field was missing after the Args class was moved to grpo_utils. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Updated changelog. * Update open_instruct/data_loader.py Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Updated code. * Fix failing tests by removing invalid dataset_name parameter The StreamingDataLoaderConfig class uses dataset_mixer_list, not dataset_name. The tests only need save_traces and rollouts_save_path. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Added rl_utils test on GPU * Added logprobs field * restored comments --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
* Add rollout saving to GRPO training Implement the save_traces flag to save all generated RL rollouts to disk during GRPO training. For each generation, saves prompt_tokens, response_tokens, advantage, reward, step, sample_idx, prompt_idx, dataset, finish_reason, ground_truth, and tool_info. - Add rollouts_save_path arg with default /weka/oe-adapt-default/allennlp/deletable_rollouts/ - Save metadata file with git commit, model name, and timestamp - Save rollouts to sharded JSONL files (new file every 10k samples) - Use async saving via ThreadPoolExecutor to not block training Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Add dataclasses for rollout saving and GPU test Refactored rollout saving to use RolloutMetadata and RolloutRecord dataclasses for type safety. Added GPU integration test to validate rollout file creation and field presence using dataclasses.fields(). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Add validation for save_traces configuration Raises ValueError if save_traces is True but rollouts_save_path is empty. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Consolidate git commit retrieval to use GIT_COMMIT env var Add get_git_commit() to utils.py that reads from environment variable. Update data_loader.py and benchmark_generators.py to use the centralized function. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Added command to run GPU tests * Add rollouts_save_path to ExperimentConfig The field was missing after the Args class was moved to grpo_utils. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Updated changelog. * Update open_instruct/data_loader.py Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Updated code. * Fix failing tests by removing invalid dataset_name parameter The StreamingDataLoaderConfig class uses dataset_mixer_list, not dataset_name. The tests only need save_traces and rollouts_save_path. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Added rl_utils test on GPU * Added logprobs field * restored comments --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Implement the save_traces flag to save all generated RL rollouts to disk during GRPO training. For each generation, saves prompt_tokens, response_tokens, advantage, reward, step, sample_idx, prompt_idx, dataset, finish_reason, ground_truth, and tool_info.
RolloutMetadataandRolloutRecorddataclasses for type safetyGPU_TESTS=01KFGWBPSPES7P8X2BP3HGEYGC