Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Bump vllm to v0.14.1, which is what the hybrid PR is based on#1433

Merged
finbarrtimbers merged 14 commits intomainallenai/open-instruct:mainfrom
finbarr/update-vllmallenai/open-instruct:finbarr/update-vllmCopy head branch name to clipboard
Jan 28, 2026
Merged

Bump vllm to v0.14.1, which is what the hybrid PR is based on#1433
finbarrtimbers merged 14 commits intomainallenai/open-instruct:mainfrom
finbarr/update-vllmallenai/open-instruct:finbarr/update-vllmCopy head branch name to clipboard

Conversation

@finbarrtimbers
Copy link
Copy Markdown
Collaborator

@finbarrtimbers finbarrtimbers commented Jan 27, 2026

Summary

Upgrades vLLM from 0.12.0 to 0.14.1 for the hybrid model support.

Changes

  • Updated vLLM to 0.14.1
  • Added fallback chat template for vLLM OpenAI server (fixes models without native chat templates)
  • Switched GPU tests to use Qwen/Qwen3-0.6B (fixes compatibility with vLLM 0.14)
  • Use skipTest instead of fail when generating expected test data (allows all data files to be generated in one run)
  • Fixed DPO debug script by disabling Beaker eval jobs
  • Removed custom aarch64 logic (no longer needed with vLLM 0.14)

Related PRs

(Replaces #1432 after branch rename)

Test Results

Test Status Link
GPU Tests (13/13) ✅ Passed 01KG2QMQNK2G275MTEZNZ3XTSP
GRPO Single GPU ✅ Passed 01KG2R6MS1D53CSNYKH8GS02JV
DPO Single GPU ✅ Passed 01KG2RP8DZZHNHWG6ER93B1JCM
GRPO Large (2x8 GPU) ✅ Passed 01KG356QBAD4NZB8N0M6WY12AK
DPO Multi-Node ✅ Passed 01KG356X5JMHKFN64KJV0DX262

🤖 Generated with Claude Code

finbarrtimbers and others added 4 commits January 27, 2026 08:57
- Simplify vllm dependency from two platform-specific lines to one
- Remove custom aarch64 wheel source (official wheels now available)
- Update numpy constraint from <2 to >=2 (required by vllm 0.14.1)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Fixes launch from macOS where vllm is not available for local caching.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The ToolParser class moved from vllm.entrypoints.openai.tool_parsers
to vllm.entrypoints.openai.tool_parsers.abstract_tool_parser in vllm 0.14.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
In vLLM 0.14, tool_parsers was moved from
vllm.entrypoints.openai.tool_parsers to vllm.tool_parsers.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @finbarrtimbers, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request primarily updates the vllm library to version 0.14.1, which is a foundational change for upcoming hybrid model support. A significant benefit of this upgrade is the elimination of previously required custom aarch64 installation logic, streamlining the build process for different architectures. The update also includes a comprehensive refresh of several other core Python dependencies to ensure compatibility and leverage the latest features.

Highlights

  • vLLM Version Bump: The vllm library has been updated to version 0.14.1, aligning with the base for the hybrid model PRs.
  • Simplified aarch64 Support: Custom aarch64 platform-specific logic for vllm installation has been removed, streamlining dependency management for DGX Spark support.
  • Updated vLLM Tool Parser Paths: Import paths for VllmNativeToolParser and VllmParserConfig have been adjusted to reflect their new location within the vllm.tool_parsers module.
  • Dependency Upgrades: Several core Python dependencies have been upgraded, including numpy, torch, compressed-tensors, opencv-python-headless, torchaudio, torchvision, triton, and xgrammar, along with the addition of grpcio-reflection.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the vllm dependency to version 0.14.1 and adjusts the codebase accordingly. The changes include updating import paths in open_instruct/tools/parsers.py to align with the new vllm structure, modifying pyproject.toml to use the new vllm version and removing the previous platform-specific logic for aarch64. The numpy dependency has also been updated to v2.x. The lock file and requirements.txt have been updated to reflect these changes and their transitive dependencies. The changes are clean and consistent with the goal of upgrading vllm. I see no issues with this PR.

finbarrtimbers and others added 8 commits January 28, 2026 08:00
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Regenerated expected output for GPU tests after upgrading vLLM from
0.12.0 to 0.14.1. The new version produces different generation
outputs even with the same seed/temperature settings.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
vLLM 0.14 requires a chat template during server warmup. Models
without a chat template (like EleutherAI/pythia-14m used in tests)
fail with ChatTemplateResolutionError during startup.

This adds a simple fallback template that is only used when the
model's tokenizer doesn't have a chat template defined. Models
with native chat templates (OLMo, Qwen, etc.) are unaffected.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replace Qwen/Qwen3-1.7B and EleutherAI/pythia-14m with Qwen/Qwen3-0.6B
as the default model for GPU tests. This fixes the
test_vllm_queue_system_single_prompt test which was failing because
vLLM 0.14 has issues with base models that lack a chat template.

Using a smaller Qwen model also speeds up tests while maintaining
compatibility with vLLM 0.14's chat template requirements.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This allows all expected data files to be generated in a single test
run instead of stopping at the first missing file. Tests are marked
as "skipped" when generating data, and will verify on the next run.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The script has push_to_hub=false which conflicts with the default
try_launch_beaker_eval_jobs=true. Explicitly disable eval jobs.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@finbarrtimbers finbarrtimbers added this pull request to the merge queue Jan 28, 2026
Merged via the queue into main with commit 7cc755f Jan 28, 2026
7 checks passed
@finbarrtimbers finbarrtimbers deleted the finbarr/update-vllm branch January 28, 2026 21:31
lukashelff pushed a commit to lukashelff/open-instruct-slurm that referenced this pull request Feb 19, 2026
…i#1433)

* Update vllm to 0.14.1 and numpy to >=2

- Simplify vllm dependency from two platform-specific lines to one
- Remove custom aarch64 wheel source (official wheels now available)
- Update numpy constraint from <2 to >=2 (required by vllm 0.14.1)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* Add --no_auto_dataset_cache to tools test script

Fixes launch from macOS where vllm is not available for local caching.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* Fix vllm 0.14 import for ToolParser

The ToolParser class moved from vllm.entrypoints.openai.tool_parsers
to vllm.entrypoints.openai.tool_parsers.abstract_tool_parser in vllm 0.14.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* Fix vllm 0.14 tool_parsers module path

In vLLM 0.14, tool_parsers was moved from
vllm.entrypoints.openai.tool_parsers to vllm.tool_parsers.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* updated changelog

* Remove outdated expected data for regeneration after vLLM 0.14 upgrade

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* Update expected test data for vLLM 0.14

Regenerated expected output for GPU tests after upgrading vLLM from
0.12.0 to 0.14.1. The new version produces different generation
outputs even with the same seed/temperature settings.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* Add fallback chat template for vLLM OpenAI server

vLLM 0.14 requires a chat template during server warmup. Models
without a chat template (like EleutherAI/pythia-14m used in tests)
fail with ChatTemplateResolutionError during startup.

This adds a simple fallback template that is only used when the
model's tokenizer doesn't have a chat template defined. Models
with native chat templates (OLMo, Qwen, etc.) are unaffected.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* Switch GPU tests to use Qwen/Qwen3-0.6B

Replace Qwen/Qwen3-1.7B and EleutherAI/pythia-14m with Qwen/Qwen3-0.6B
as the default model for GPU tests. This fixes the
test_vllm_queue_system_single_prompt test which was failing because
vLLM 0.14 has issues with base models that lack a chat template.

Using a smaller Qwen model also speeds up tests while maintaining
compatibility with vLLM 0.14's chat template requirements.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* Add expected test data for Qwen3-0.6B (with_tools)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* Use skipTest instead of fail when generating expected test data

This allows all expected data files to be generated in a single test
run instead of stopping at the first missing file. Tests are marked
as "skipped" when generating data, and will verify on the next run.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* Add expected test data for Qwen3-0.6B (without_tools)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* Fix DPO debug script by disabling Beaker eval jobs

The script has push_to_hub=false which conflicts with the default
try_launch_beaker_eval_jobs=true. Explicitly disable eval jobs.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Morty Proxy This is a proxified and sanitized view of the page, visit original site.