Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Extract ExperimentConfig to grpo_utils.py (GRPO olmo-core implementation: PR 1 of 4)#1396

Merged
finbarrtimbers merged 6 commits intomainallenai/open-instruct:mainfrom
finbarr/grpo-utils-configallenai/open-instruct:finbarr/grpo-utils-configCopy head branch name to clipboard
Jan 20, 2026
Merged

Extract ExperimentConfig to grpo_utils.py (GRPO olmo-core implementation: PR 1 of 4)#1396
finbarrtimbers merged 6 commits intomainallenai/open-instruct:mainfrom
finbarr/grpo-utils-configallenai/open-instruct:finbarr/grpo-utils-configCopy head branch name to clipboard

Conversation

@finbarrtimbers
Copy link
Copy Markdown
Collaborator

@finbarrtimbers finbarrtimbers commented Jan 20, 2026

Summary

  • Extract ExperimentConfig dataclass from grpo_fast.py to new grpo_utils.py module
  • Update grpo_fast.py to import from grpo_utils
  • Update benchmark_generators.py to import from grpo_utils

This refactoring prepares for the OLMo-core GRPO trainer by sharing configuration between both trainers.

Ran the single GPU GRPO script: Beaker.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @finbarrtimbers, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refactors the experiment configuration by extracting the ExperimentConfig dataclass into its own utility module. This change is a preparatory step to enable sharing of configuration settings between different GRPO trainers, specifically for the upcoming OLMo-core GRPO trainer implementation.

Highlights

  • Configuration Extraction: The ExperimentConfig dataclass has been moved from grpo_fast.py into a new, dedicated module grpo_utils.py to centralize configuration definitions.
  • Module Imports Updated: Both grpo_fast.py and benchmark_generators.py have been updated to import ExperimentConfig from the new grpo_utils module, replacing direct references to grpo_fast.Args.
  • Type Hint Refactoring: Numerous function signatures across grpo_fast.py and benchmark_generators.py have been updated to reflect the new grpo_utils.ExperimentConfig type hint for configuration arguments.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request successfully extracts the ExperimentConfig dataclass from grpo_fast.py to a new grpo_utils.py module, and updates all references accordingly. This is a good step towards modularizing the codebase and preparing for future trainer implementations. The changes are well-contained and the type hint updates are consistent across the affected files.

Comment thread open_instruct/grpo_utils.py Outdated
Comment on lines +92 to +131
def __post_init__(self):
if self.use_vllm_logprobs and self.truncated_importance_sampling_ratio_cap > 0.0:
raise ValueError(
"Cannot use both `use_vllm_logprobs` and `truncated_importance_sampling_ratio_cap`. "
"use_vllm_logprobs sets old_logprobs to vLLM logprobs, making importance sampling pointless."
)
self.loss_denominator = utils.get_denominator(self.loss_denominator)
if self.checkpoint_state_freq > 0 and self.checkpoint_state_dir is None:
raise ValueError("`checkpoint_state_dir` must be provided if `checkpoint_state_freq` is greater than 0!")
if self.checkpoint_state_dir is not None and self.checkpoint_state_freq == -1:
raise ValueError("`checkpoint_state_freq` must be greater than 0 if `checkpoint_state_dir` is provided!")

if self.gs_checkpoint_state_dir is not None and not self.gs_checkpoint_state_dir.startswith("gs://"):
raise ValueError(f"`gs_checkpoint_state_dir` must start with 'gs://', got: {self.gs_checkpoint_state_dir}")
if self.gs_bucket_path is not None and not self.gs_bucket_path.startswith("gs://"):
raise ValueError(f"`gs_bucket_path` must start with 'gs://', got: {self.gs_bucket_path}")
if self.sequence_parallel_size > 1 and self.deepspeed_stage != 3:
raise ValueError("`sequence_parallel_size` > 1 requires `deepspeed_stage` to be 3!")

if self.gs_bucket_path is not None and self.gs_checkpoint_state_dir is None:
if self.checkpoint_state_dir is None:
raise ValueError("`checkpoint_state_dir` must be provided when using `gs_bucket_path`!")
checkpoint_dir_name = self.checkpoint_state_dir.rstrip("/")
beaker_users = get_beaker_whoami()
if beaker_users is not None:
self.gs_checkpoint_state_dir = f"{self.gs_bucket_path}/{beaker_users}/{checkpoint_dir_name}"
else:
self.gs_checkpoint_state_dir = f"{self.gs_bucket_path}/{checkpoint_dir_name}"
if not checkpoint_dir_name.startswith("/filestore"):
self.checkpoint_state_dir = f"/filestore{self.checkpoint_state_dir}"

if self.checkpoint_state_dir is not None:
if self.gs_checkpoint_state_dir is not None:
download_latest_checkpoint_from_gs(self.gs_checkpoint_state_dir, self.checkpoint_state_dir)
calibrate_checkpoint_state_dir(self.checkpoint_state_dir)
if not self.load_ref_policy and self.beta != 0.0:
raise ValueError(
"When load_ref_policy=False, beta must be 0.0. "
f"Got beta={self.beta}. Set --beta 0.0 or --load_ref_policy to use KL penalty."
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The __post_init__ method of ExperimentConfig contains logic that performs side effects, such as calling get_beaker_whoami, download_latest_checkpoint_from_gs, and calibrate_checkpoint_state_dir. While this logic was present in the original Args class, placing operational logic directly within a dataclass's __post_init__ can make the configuration less pure and introduce unexpected dependencies or side effects when simply creating an ExperimentConfig object. It's generally better to separate configuration definition from operational logic to improve modularity and testability.

loss_fn: Literal["dapo", "cispo"] = "dapo"
record_entropy: bool = False
use_vllm_logprobs: bool = False
temperature: float = field(default=1.0, init=False)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The temperature field is defined with init=False, meaning it's not part of the constructor, but its value is fixed to 1.0. There's no logic within this class to dynamically set or modify this value based on other parameters. Given that StreamingDataLoaderConfig also has a temperature field, this duplication and the init=False here can be confusing. If this temperature is meant to be a fixed default for the experiment, it should either be removed from ExperimentConfig (if StreamingDataLoaderConfig.temperature is the authoritative source) or explicitly used/validated within ExperimentConfig.

finbarrtimbers and others added 6 commits January 20, 2026 15:53
…ion: PR 1 of 4)

This refactoring extracts the shared configuration class that both grpo_fast.py
(existing DeepSpeed trainer) and the upcoming grpo.py (new OLMo-core trainer) need.

- Create grpo_utils.py with ExperimentConfig dataclass (moved from grpo_fast.py Args)
- Update grpo_fast.py to import from grpo_utils
- Update benchmark_generators.py to import from grpo_utils

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@finbarrtimbers finbarrtimbers force-pushed the finbarr/grpo-utils-config branch from f1f3628 to 691668a Compare January 20, 2026 22:53
Copy link
Copy Markdown
Collaborator

@natolambert natolambert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Appreciate the bitwise prs, not big bigs

@finbarrtimbers finbarrtimbers added this pull request to the merge queue Jan 20, 2026
@finbarrtimbers
Copy link
Copy Markdown
Collaborator Author

LGTM. Appreciate the bitwise prs, not big bigs

At DeepMind, the senior engineers would yell at you for big PRs haha.

Merged via the queue into main with commit 71e150c Jan 20, 2026
6 of 7 checks passed
@finbarrtimbers finbarrtimbers deleted the finbarr/grpo-utils-config branch January 20, 2026 23:43
@finbarrtimbers finbarrtimbers restored the finbarr/grpo-utils-config branch January 21, 2026 14:22
sang1583535 pushed a commit to sang1583535/open-instruct that referenced this pull request Feb 3, 2026
…ion: PR 1 of 4) (allenai#1396)

* Extract ExperimentConfig to grpo_utils.py (GRPO olmo-core implementation: PR 1 of 4)

This refactoring extracts the shared configuration class that both grpo_fast.py
(existing DeepSpeed trainer) and the upcoming grpo.py (new OLMo-core trainer) need.

- Create grpo_utils.py with ExperimentConfig dataclass (moved from grpo_fast.py Args)
- Update grpo_fast.py to import from grpo_utils
- Update benchmark_generators.py to import from grpo_utils

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* a bunch of claude tweaks

* Apply suggestion from @gemini-code-assist[bot]

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* Cleaned up PR.

* Added comment to dataset filter.

* updated changelog

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
lukashelff pushed a commit to lukashelff/open-instruct-slurm that referenced this pull request Feb 19, 2026
…ion: PR 1 of 4) (allenai#1396)

* Extract ExperimentConfig to grpo_utils.py (GRPO olmo-core implementation: PR 1 of 4)

This refactoring extracts the shared configuration class that both grpo_fast.py
(existing DeepSpeed trainer) and the upcoming grpo.py (new OLMo-core trainer) need.

- Create grpo_utils.py with ExperimentConfig dataclass (moved from grpo_fast.py Args)
- Update grpo_fast.py to import from grpo_utils
- Update benchmark_generators.py to import from grpo_utils

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* a bunch of claude tweaks

* Apply suggestion from @gemini-code-assist[bot]

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* Cleaned up PR.

* Added comment to dataset filter.

* updated changelog

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Morty Proxy This is a proxified and sanitized view of the page, visit original site.