Bump vllm to v0.14.1, which is what the hybrid PR is based on by finbarrtimbers · Pull Request #1433 · allenai/open-instruct

finbarrtimbers · Jan 27, 2026

Summary

Upgrades vLLM from 0.12.0 to 0.14.1 for the hybrid model support.

Changes

Updated vLLM to 0.14.1
Added fallback chat template for vLLM OpenAI server (fixes models without native chat templates)
Switched GPU tests to use Qwen/Qwen3-0.6B (fixes compatibility with vLLM 0.14)
Use skipTest instead of fail when generating expected test data (allows all data files to be generated in one run)
Fixed DPO debug script by disabling Beaker eval jobs
Removed custom aarch64 logic (no longer needed with vLLM 0.14)

Related PRs

transformers: Add OLMo Hybrid model huggingface/transformers#43358
vllm: [Model] Add support for OLMo Hybrid vllm-project/vllm#32550

(Replaces #1432 after branch rename)

Test Results

Test	Status	Link
GPU Tests (13/13)	✅ Passed	01KG2QMQNK2G275MTEZNZ3XTSP
GRPO Single GPU	✅ Passed	01KG2R6MS1D53CSNYKH8GS02JV
DPO Single GPU	✅ Passed	01KG2RP8DZZHNHWG6ER93B1JCM
GRPO Large (2x8 GPU)	✅ Passed	01KG356QBAD4NZB8N0M6WY12AK
DPO Multi-Node	✅ Passed	01KG356X5JMHKFN64KJV0DX262

🤖 Generated with Claude Code

- Simplify vllm dependency from two platform-specific lines to one - Remove custom aarch64 wheel source (official wheels now available) - Update numpy constraint from <2 to >=2 (required by vllm 0.14.1) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Fixes launch from macOS where vllm is not available for local caching. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

The ToolParser class moved from vllm.entrypoints.openai.tool_parsers to vllm.entrypoints.openai.tool_parsers.abstract_tool_parser in vllm 0.14. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

In vLLM 0.14, tool_parsers was moved from vllm.entrypoints.openai.tool_parsers to vllm.tool_parsers. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

gemini-code-assist · Jan 27, 2026

Summary of Changes

Hello @finbarrtimbers, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request primarily updates the vllm library to version 0.14.1, which is a foundational change for upcoming hybrid model support. A significant benefit of this upgrade is the elimination of previously required custom aarch64 installation logic, streamlining the build process for different architectures. The update also includes a comprehensive refresh of several other core Python dependencies to ensure compatibility and leverage the latest features.

Highlights

vLLM Version Bump: The vllm library has been updated to version 0.14.1, aligning with the base for the hybrid model PRs.
Simplified aarch64 Support: Custom aarch64 platform-specific logic for vllm installation has been removed, streamlining dependency management for DGX Spark support.
Updated vLLM Tool Parser Paths: Import paths for VllmNativeToolParser and VllmParserConfig have been adjusted to reflect their new location within the vllm.tool_parsers module.
Dependency Upgrades: Several core Python dependencies have been upgraded, including numpy, torch, compressed-tensors, opencv-python-headless, torchaudio, torchvision, triton, and xgrammar, along with the addition of grpcio-reflection.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request updates the vllm dependency to version 0.14.1 and adjusts the codebase accordingly. The changes include updating import paths in open_instruct/tools/parsers.py to align with the new vllm structure, modifying pyproject.toml to use the new vllm version and removing the previous platform-specific logic for aarch64. The numpy dependency has also been updated to v2.x. The lock file and requirements.txt have been updated to reflect these changes and their transitive dependencies. The changes are clean and consistent with the goal of upgrading vllm. I see no issues with this PR.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Regenerated expected output for GPU tests after upgrading vLLM from 0.12.0 to 0.14.1. The new version produces different generation outputs even with the same seed/temperature settings. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

vLLM 0.14 requires a chat template during server warmup. Models without a chat template (like EleutherAI/pythia-14m used in tests) fail with ChatTemplateResolutionError during startup. This adds a simple fallback template that is only used when the model's tokenizer doesn't have a chat template defined. Models with native chat templates (OLMo, Qwen, etc.) are unaffected. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Replace Qwen/Qwen3-1.7B and EleutherAI/pythia-14m with Qwen/Qwen3-0.6B as the default model for GPU tests. This fixes the test_vllm_queue_system_single_prompt test which was failing because vLLM 0.14 has issues with base models that lack a chat template. Using a smaller Qwen model also speeds up tests while maintaining compatibility with vLLM 0.14's chat template requirements. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

This allows all expected data files to be generated in a single test run instead of stopping at the first missing file. Tests are marked as "skipped" when generating data, and will verify on the next run. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

The script has push_to_hub=false which conflicts with the default try_launch_beaker_eval_jobs=true. Explicitly disable eval jobs. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…i#1433) * Update vllm to 0.14.1 and numpy to >=2 - Simplify vllm dependency from two platform-specific lines to one - Remove custom aarch64 wheel source (official wheels now available) - Update numpy constraint from <2 to >=2 (required by vllm 0.14.1) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Add --no_auto_dataset_cache to tools test script Fixes launch from macOS where vllm is not available for local caching. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Fix vllm 0.14 import for ToolParser The ToolParser class moved from vllm.entrypoints.openai.tool_parsers to vllm.entrypoints.openai.tool_parsers.abstract_tool_parser in vllm 0.14. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Fix vllm 0.14 tool_parsers module path In vLLM 0.14, tool_parsers was moved from vllm.entrypoints.openai.tool_parsers to vllm.tool_parsers. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * updated changelog * Remove outdated expected data for regeneration after vLLM 0.14 upgrade Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Update expected test data for vLLM 0.14 Regenerated expected output for GPU tests after upgrading vLLM from 0.12.0 to 0.14.1. The new version produces different generation outputs even with the same seed/temperature settings. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Add fallback chat template for vLLM OpenAI server vLLM 0.14 requires a chat template during server warmup. Models without a chat template (like EleutherAI/pythia-14m used in tests) fail with ChatTemplateResolutionError during startup. This adds a simple fallback template that is only used when the model's tokenizer doesn't have a chat template defined. Models with native chat templates (OLMo, Qwen, etc.) are unaffected. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Switch GPU tests to use Qwen/Qwen3-0.6B Replace Qwen/Qwen3-1.7B and EleutherAI/pythia-14m with Qwen/Qwen3-0.6B as the default model for GPU tests. This fixes the test_vllm_queue_system_single_prompt test which was failing because vLLM 0.14 has issues with base models that lack a chat template. Using a smaller Qwen model also speeds up tests while maintaining compatibility with vLLM 0.14's chat template requirements. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Add expected test data for Qwen3-0.6B (with_tools) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Use skipTest instead of fail when generating expected test data This allows all expected data files to be generated in a single test run instead of stopping at the first missing file. Tests are marked as "skipped" when generating data, and will verify on the next run. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Add expected test data for Qwen3-0.6B (without_tools) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Fix DPO debug script by disabling Beaker eval jobs The script has push_to_hub=false which conflicts with the default try_launch_beaker_eval_jobs=true. Explicitly disable eval jobs. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

finbarrtimbers and others added 4 commits January 27, 2026 08:57

Add --no_auto_dataset_cache to tools test script

2c907bf

Fixes launch from macOS where vllm is not available for local caching. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Fix vllm 0.14 import for ToolParser

f9add6f

The ToolParser class moved from vllm.entrypoints.openai.tool_parsers to vllm.entrypoints.openai.tool_parsers.abstract_tool_parser in vllm 0.14. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Fix vllm 0.14 tool_parsers module path

aed5ddd

In vLLM 0.14, tool_parsers was moved from vllm.entrypoints.openai.tool_parsers to vllm.tool_parsers. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

gemini-code-assist Bot reviewed Jan 27, 2026

View reviewed changes

finbarrtimbers added 2 commits January 27, 2026 15:09

updated changelog

3674197

Merge branch 'main' into finbarr/update-vllm

5605a4f

finbarrtimbers requested a review from natolambert January 27, 2026 22:10

natolambert approved these changes Jan 27, 2026

View reviewed changes

finbarrtimbers and others added 8 commits January 28, 2026 08:00

Remove outdated expected data for regeneration after vLLM 0.14 upgrade

38d8e9a

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Update expected test data for vLLM 0.14

f2e0b0e

Regenerated expected output for GPU tests after upgrading vLLM from 0.12.0 to 0.14.1. The new version produces different generation outputs even with the same seed/temperature settings. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add expected test data for Qwen3-0.6B (with_tools)

c20f7a6

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add expected test data for Qwen3-0.6B (without_tools)

25c4ae7

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Fix DPO debug script by disabling Beaker eval jobs

cb88128

The script has push_to_hub=false which conflicts with the default try_launch_beaker_eval_jobs=true. Explicitly disable eval jobs. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

finbarrtimbers enabled auto-merge January 28, 2026 21:14

finbarrtimbers added this pull request to the merge queue Jan 28, 2026

Merged via the queue into main with commit 7cc755f Jan 28, 2026
7 checks passed

finbarrtimbers deleted the finbarr/update-vllm branch January 28, 2026 21:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bump vllm to v0.14.1, which is what the hybrid PR is based on#1433

Bump vllm to v0.14.1, which is what the hybrid PR is based on#1433
finbarrtimbers merged 14 commits intomainallenai/open-instruct:mainfrom
finbarr/update-vllmallenai/open-instruct:finbarr/update-vllmCopy head branch name to clipboard

finbarrtimbers commented Jan 27, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot commented Jan 27, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Search code, repositories, users, issues, pull requests...

Conversation

finbarrtimbers commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Related PRs

Test Results

Uh oh!

gemini-code-assist Bot commented Jan 27, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

finbarrtimbers commented Jan 27, 2026 •

edited

Loading