Extract ExperimentConfig to grpo_utils.py (GRPO olmo-core implementation: PR 1 of 4)#1396
Extract ExperimentConfig to grpo_utils.py (GRPO olmo-core implementation: PR 1 of 4)#1396finbarrtimbers merged 6 commits intomainallenai/open-instruct:mainfrom finbarr/grpo-utils-configallenai/open-instruct:finbarr/grpo-utils-configCopy head branch name to clipboard
Conversation
Summary of ChangesHello @finbarrtimbers, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request refactors the experiment configuration by extracting the Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request successfully extracts the ExperimentConfig dataclass from grpo_fast.py to a new grpo_utils.py module, and updates all references accordingly. This is a good step towards modularizing the codebase and preparing for future trainer implementations. The changes are well-contained and the type hint updates are consistent across the affected files.
| def __post_init__(self): | ||
| if self.use_vllm_logprobs and self.truncated_importance_sampling_ratio_cap > 0.0: | ||
| raise ValueError( | ||
| "Cannot use both `use_vllm_logprobs` and `truncated_importance_sampling_ratio_cap`. " | ||
| "use_vllm_logprobs sets old_logprobs to vLLM logprobs, making importance sampling pointless." | ||
| ) | ||
| self.loss_denominator = utils.get_denominator(self.loss_denominator) | ||
| if self.checkpoint_state_freq > 0 and self.checkpoint_state_dir is None: | ||
| raise ValueError("`checkpoint_state_dir` must be provided if `checkpoint_state_freq` is greater than 0!") | ||
| if self.checkpoint_state_dir is not None and self.checkpoint_state_freq == -1: | ||
| raise ValueError("`checkpoint_state_freq` must be greater than 0 if `checkpoint_state_dir` is provided!") | ||
|
|
||
| if self.gs_checkpoint_state_dir is not None and not self.gs_checkpoint_state_dir.startswith("gs://"): | ||
| raise ValueError(f"`gs_checkpoint_state_dir` must start with 'gs://', got: {self.gs_checkpoint_state_dir}") | ||
| if self.gs_bucket_path is not None and not self.gs_bucket_path.startswith("gs://"): | ||
| raise ValueError(f"`gs_bucket_path` must start with 'gs://', got: {self.gs_bucket_path}") | ||
| if self.sequence_parallel_size > 1 and self.deepspeed_stage != 3: | ||
| raise ValueError("`sequence_parallel_size` > 1 requires `deepspeed_stage` to be 3!") | ||
|
|
||
| if self.gs_bucket_path is not None and self.gs_checkpoint_state_dir is None: | ||
| if self.checkpoint_state_dir is None: | ||
| raise ValueError("`checkpoint_state_dir` must be provided when using `gs_bucket_path`!") | ||
| checkpoint_dir_name = self.checkpoint_state_dir.rstrip("/") | ||
| beaker_users = get_beaker_whoami() | ||
| if beaker_users is not None: | ||
| self.gs_checkpoint_state_dir = f"{self.gs_bucket_path}/{beaker_users}/{checkpoint_dir_name}" | ||
| else: | ||
| self.gs_checkpoint_state_dir = f"{self.gs_bucket_path}/{checkpoint_dir_name}" | ||
| if not checkpoint_dir_name.startswith("/filestore"): | ||
| self.checkpoint_state_dir = f"/filestore{self.checkpoint_state_dir}" | ||
|
|
||
| if self.checkpoint_state_dir is not None: | ||
| if self.gs_checkpoint_state_dir is not None: | ||
| download_latest_checkpoint_from_gs(self.gs_checkpoint_state_dir, self.checkpoint_state_dir) | ||
| calibrate_checkpoint_state_dir(self.checkpoint_state_dir) | ||
| if not self.load_ref_policy and self.beta != 0.0: | ||
| raise ValueError( | ||
| "When load_ref_policy=False, beta must be 0.0. " | ||
| f"Got beta={self.beta}. Set --beta 0.0 or --load_ref_policy to use KL penalty." | ||
| ) |
There was a problem hiding this comment.
The __post_init__ method of ExperimentConfig contains logic that performs side effects, such as calling get_beaker_whoami, download_latest_checkpoint_from_gs, and calibrate_checkpoint_state_dir. While this logic was present in the original Args class, placing operational logic directly within a dataclass's __post_init__ can make the configuration less pure and introduce unexpected dependencies or side effects when simply creating an ExperimentConfig object. It's generally better to separate configuration definition from operational logic to improve modularity and testability.
| loss_fn: Literal["dapo", "cispo"] = "dapo" | ||
| record_entropy: bool = False | ||
| use_vllm_logprobs: bool = False | ||
| temperature: float = field(default=1.0, init=False) |
There was a problem hiding this comment.
The temperature field is defined with init=False, meaning it's not part of the constructor, but its value is fixed to 1.0. There's no logic within this class to dynamically set or modify this value based on other parameters. Given that StreamingDataLoaderConfig also has a temperature field, this duplication and the init=False here can be confusing. If this temperature is meant to be a fixed default for the experiment, it should either be removed from ExperimentConfig (if StreamingDataLoaderConfig.temperature is the authoritative source) or explicitly used/validated within ExperimentConfig.
…ion: PR 1 of 4) This refactoring extracts the shared configuration class that both grpo_fast.py (existing DeepSpeed trainer) and the upcoming grpo.py (new OLMo-core trainer) need. - Create grpo_utils.py with ExperimentConfig dataclass (moved from grpo_fast.py Args) - Update grpo_fast.py to import from grpo_utils - Update benchmark_generators.py to import from grpo_utils Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
f1f3628 to
691668a
Compare
natolambert
left a comment
There was a problem hiding this comment.
LGTM. Appreciate the bitwise prs, not big bigs
At DeepMind, the senior engineers would yell at you for big PRs haha. |
…ion: PR 1 of 4) (allenai#1396) * Extract ExperimentConfig to grpo_utils.py (GRPO olmo-core implementation: PR 1 of 4) This refactoring extracts the shared configuration class that both grpo_fast.py (existing DeepSpeed trainer) and the upcoming grpo.py (new OLMo-core trainer) need. - Create grpo_utils.py with ExperimentConfig dataclass (moved from grpo_fast.py Args) - Update grpo_fast.py to import from grpo_utils - Update benchmark_generators.py to import from grpo_utils Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * a bunch of claude tweaks * Apply suggestion from @gemini-code-assist[bot] Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Cleaned up PR. * Added comment to dataset filter. * updated changelog --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
…ion: PR 1 of 4) (allenai#1396) * Extract ExperimentConfig to grpo_utils.py (GRPO olmo-core implementation: PR 1 of 4) This refactoring extracts the shared configuration class that both grpo_fast.py (existing DeepSpeed trainer) and the upcoming grpo.py (new OLMo-core trainer) need. - Create grpo_utils.py with ExperimentConfig dataclass (moved from grpo_fast.py Args) - Update grpo_fast.py to import from grpo_utils - Update benchmark_generators.py to import from grpo_utils Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * a bunch of claude tweaks * Apply suggestion from @gemini-code-assist[bot] Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Cleaned up PR. * Added comment to dataset filter. * updated changelog --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Summary
ExperimentConfigdataclass fromgrpo_fast.pyto newgrpo_utils.pymodulegrpo_fast.pyto import fromgrpo_utilsbenchmark_generators.pyto import fromgrpo_utilsThis refactoring prepares for the OLMo-core GRPO trainer by sharing configuration between both trainers.
Ran the single GPU GRPO script: Beaker.