Extract ExperimentConfig to grpo_utils.py (GRPO olmo-core implementation: PR 1 of 4) by finbarrtimbers · Pull Request #1396 · allenai/open-instruct

finbarrtimbers · Jan 20, 2026

Summary

Extract ExperimentConfig dataclass from grpo_fast.py to new grpo_utils.py module
Update grpo_fast.py to import from grpo_utils
Update benchmark_generators.py to import from grpo_utils

This refactoring prepares for the OLMo-core GRPO trainer by sharing configuration between both trainers.

Ran the single GPU GRPO script: Beaker.

gemini-code-assist · Jan 20, 2026

Summary of Changes

Hello @finbarrtimbers, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refactors the experiment configuration by extracting the ExperimentConfig dataclass into its own utility module. This change is a preparatory step to enable sharing of configuration settings between different GRPO trainers, specifically for the upcoming OLMo-core GRPO trainer implementation.

Highlights

Configuration Extraction: The ExperimentConfig dataclass has been moved from grpo_fast.py into a new, dedicated module grpo_utils.py to centralize configuration definitions.
Module Imports Updated: Both grpo_fast.py and benchmark_generators.py have been updated to import ExperimentConfig from the new grpo_utils module, replacing direct references to grpo_fast.Args.
Type Hint Refactoring: Numerous function signatures across grpo_fast.py and benchmark_generators.py have been updated to reflect the new grpo_utils.ExperimentConfig type hint for configuration arguments.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request successfully extracts the ExperimentConfig dataclass from grpo_fast.py to a new grpo_utils.py module, and updates all references accordingly. This is a good step towards modularizing the codebase and preparing for future trainer implementations. The changes are well-contained and the type hint updates are consistent across the affected files.

gemini-code-assist · Jan 20, 2026

+    def __post_init__(self):
+        if self.use_vllm_logprobs and self.truncated_importance_sampling_ratio_cap > 0.0:
+            raise ValueError(
+                "Cannot use both `use_vllm_logprobs` and `truncated_importance_sampling_ratio_cap`. "
+                "use_vllm_logprobs sets old_logprobs to vLLM logprobs, making importance sampling pointless."
+            )
+        self.loss_denominator = utils.get_denominator(self.loss_denominator)
+        if self.checkpoint_state_freq > 0 and self.checkpoint_state_dir is None:
+            raise ValueError("`checkpoint_state_dir` must be provided if `checkpoint_state_freq` is greater than 0!")
+        if self.checkpoint_state_dir is not None and self.checkpoint_state_freq == -1:
+            raise ValueError("`checkpoint_state_freq` must be greater than 0 if `checkpoint_state_dir` is provided!")
+
+        if self.gs_checkpoint_state_dir is not None and not self.gs_checkpoint_state_dir.startswith("gs://"):
+            raise ValueError(f"`gs_checkpoint_state_dir` must start with 'gs://', got: {self.gs_checkpoint_state_dir}")
+        if self.gs_bucket_path is not None and not self.gs_bucket_path.startswith("gs://"):
+            raise ValueError(f"`gs_bucket_path` must start with 'gs://', got: {self.gs_bucket_path}")
+        if self.sequence_parallel_size > 1 and self.deepspeed_stage != 3:
+            raise ValueError("`sequence_parallel_size` > 1 requires `deepspeed_stage` to be 3!")
+
+        if self.gs_bucket_path is not None and self.gs_checkpoint_state_dir is None:
+            if self.checkpoint_state_dir is None:
+                raise ValueError("`checkpoint_state_dir` must be provided when using `gs_bucket_path`!")
+            checkpoint_dir_name = self.checkpoint_state_dir.rstrip("/")
+            beaker_users = get_beaker_whoami()
+            if beaker_users is not None:
+                self.gs_checkpoint_state_dir = f"{self.gs_bucket_path}/{beaker_users}/{checkpoint_dir_name}"
+            else:
+                self.gs_checkpoint_state_dir = f"{self.gs_bucket_path}/{checkpoint_dir_name}"
+            if not checkpoint_dir_name.startswith("/filestore"):
+                self.checkpoint_state_dir = f"/filestore{self.checkpoint_state_dir}"
+
+        if self.checkpoint_state_dir is not None:
+            if self.gs_checkpoint_state_dir is not None:
+                download_latest_checkpoint_from_gs(self.gs_checkpoint_state_dir, self.checkpoint_state_dir)
+            calibrate_checkpoint_state_dir(self.checkpoint_state_dir)
+        if not self.load_ref_policy and self.beta != 0.0:
+            raise ValueError(
+                "When load_ref_policy=False, beta must be 0.0. "
+                f"Got beta={self.beta}. Set --beta 0.0 or --load_ref_policy to use KL penalty."
+            )


The __post_init__ method of ExperimentConfig contains logic that performs side effects, such as calling get_beaker_whoami, download_latest_checkpoint_from_gs, and calibrate_checkpoint_state_dir. While this logic was present in the original Args class, placing operational logic directly within a dataclass's __post_init__ can make the configuration less pure and introduce unexpected dependencies or side effects when simply creating an ExperimentConfig object. It's generally better to separate configuration definition from operational logic to improve modularity and testability.

gemini-code-assist · Jan 20, 2026

+    loss_fn: Literal["dapo", "cispo"] = "dapo"
+    record_entropy: bool = False
+    use_vllm_logprobs: bool = False
+    temperature: float = field(default=1.0, init=False)


The temperature field is defined with init=False, meaning it's not part of the constructor, but its value is fixed to 1.0. There's no logic within this class to dynamically set or modify this value based on other parameters. Given that StreamingDataLoaderConfig also has a temperature field, this duplication and the init=False here can be confusing. If this temperature is meant to be a fixed default for the experiment, it should either be removed from ExperimentConfig (if StreamingDataLoaderConfig.temperature is the authoritative source) or explicitly used/validated within ExperimentConfig.

…ion: PR 1 of 4) This refactoring extracts the shared configuration class that both grpo_fast.py (existing DeepSpeed trainer) and the upcoming grpo.py (new OLMo-core trainer) need. - Create grpo_utils.py with ExperimentConfig dataclass (moved from grpo_fast.py Args) - Update grpo_fast.py to import from grpo_utils - Update benchmark_generators.py to import from grpo_utils Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

natolambert

LGTM. Appreciate the bitwise prs, not big bigs

finbarrtimbers · Jan 20, 2026

LGTM. Appreciate the bitwise prs, not big bigs

At DeepMind, the senior engineers would yell at you for big PRs haha.

…ion: PR 1 of 4) (allenai#1396) * Extract ExperimentConfig to grpo_utils.py (GRPO olmo-core implementation: PR 1 of 4) This refactoring extracts the shared configuration class that both grpo_fast.py (existing DeepSpeed trainer) and the upcoming grpo.py (new OLMo-core trainer) need. - Create grpo_utils.py with ExperimentConfig dataclass (moved from grpo_fast.py Args) - Update grpo_fast.py to import from grpo_utils - Update benchmark_generators.py to import from grpo_utils Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * a bunch of claude tweaks * Apply suggestion from @gemini-code-assist[bot] Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Cleaned up PR. * Added comment to dataset filter. * updated changelog --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

finbarrtimbers mentioned this pull request Jan 20, 2026

Add GRPO callbacks for OLMo-core Trainer (GRPO olmo-core: PR 3 of 5) #1397

Merged

gemini-code-assist Bot reviewed Jan 20, 2026

View reviewed changes

finbarrtimbers requested a review from natolambert January 20, 2026 20:54

finbarrtimbers and others added 6 commits January 20, 2026 15:53

a bunch of claude tweaks

02949ee

Apply suggestion from @gemini-code-assist[bot]

e49e70b

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Cleaned up PR.

ae49806

Added comment to dataset filter.

79cc7b5

updated changelog

691668a

finbarrtimbers force-pushed the finbarr/grpo-utils-config branch from f1f3628 to 691668a Compare January 20, 2026 22:53

natolambert approved these changes Jan 20, 2026

View reviewed changes

finbarrtimbers added this pull request to the merge queue Jan 20, 2026

Merged via the queue into main with commit 71e150c Jan 20, 2026
6 of 7 checks passed

finbarrtimbers deleted the finbarr/grpo-utils-config branch January 20, 2026 23:43

finbarrtimbers restored the finbarr/grpo-utils-config branch January 21, 2026 14:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extract ExperimentConfig to grpo_utils.py (GRPO olmo-core implementation: PR 1 of 4)#1396

Extract ExperimentConfig to grpo_utils.py (GRPO olmo-core implementation: PR 1 of 4)#1396
finbarrtimbers merged 6 commits intomainallenai/open-instruct:mainfrom
finbarr/grpo-utils-configallenai/open-instruct:finbarr/grpo-utils-configCopy head branch name to clipboard

finbarrtimbers commented Jan 20, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot commented Jan 20, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

gemini-code-assist Bot Jan 20, 2026

Uh oh!

gemini-code-assist Bot Jan 20, 2026

Uh oh!

natolambert left a comment

Uh oh!

finbarrtimbers commented Jan 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Search code, repositories, users, issues, pull requests...

Conversation

finbarrtimbers commented Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

gemini-code-assist Bot commented Jan 20, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

gemini-code-assist Bot Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

natolambert left a comment

Choose a reason for hiding this comment

Uh oh!

finbarrtimbers commented Jan 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

finbarrtimbers commented Jan 20, 2026 •

edited

Loading