Bump vllm to v0.14.1, which is what the hybrid PR is based on#1433
Bump vllm to v0.14.1, which is what the hybrid PR is based on#1433finbarrtimbers merged 14 commits intomainallenai/open-instruct:mainfrom
Conversation
- Simplify vllm dependency from two platform-specific lines to one - Remove custom aarch64 wheel source (official wheels now available) - Update numpy constraint from <2 to >=2 (required by vllm 0.14.1) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Fixes launch from macOS where vllm is not available for local caching. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The ToolParser class moved from vllm.entrypoints.openai.tool_parsers to vllm.entrypoints.openai.tool_parsers.abstract_tool_parser in vllm 0.14. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
In vLLM 0.14, tool_parsers was moved from vllm.entrypoints.openai.tool_parsers to vllm.tool_parsers. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Summary of ChangesHello @finbarrtimbers, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request primarily updates the Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request updates the vllm dependency to version 0.14.1 and adjusts the codebase accordingly. The changes include updating import paths in open_instruct/tools/parsers.py to align with the new vllm structure, modifying pyproject.toml to use the new vllm version and removing the previous platform-specific logic for aarch64. The numpy dependency has also been updated to v2.x. The lock file and requirements.txt have been updated to reflect these changes and their transitive dependencies. The changes are clean and consistent with the goal of upgrading vllm. I see no issues with this PR.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Regenerated expected output for GPU tests after upgrading vLLM from 0.12.0 to 0.14.1. The new version produces different generation outputs even with the same seed/temperature settings. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
vLLM 0.14 requires a chat template during server warmup. Models without a chat template (like EleutherAI/pythia-14m used in tests) fail with ChatTemplateResolutionError during startup. This adds a simple fallback template that is only used when the model's tokenizer doesn't have a chat template defined. Models with native chat templates (OLMo, Qwen, etc.) are unaffected. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replace Qwen/Qwen3-1.7B and EleutherAI/pythia-14m with Qwen/Qwen3-0.6B as the default model for GPU tests. This fixes the test_vllm_queue_system_single_prompt test which was failing because vLLM 0.14 has issues with base models that lack a chat template. Using a smaller Qwen model also speeds up tests while maintaining compatibility with vLLM 0.14's chat template requirements. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This allows all expected data files to be generated in a single test run instead of stopping at the first missing file. Tests are marked as "skipped" when generating data, and will verify on the next run. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The script has push_to_hub=false which conflicts with the default try_launch_beaker_eval_jobs=true. Explicitly disable eval jobs. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…i#1433) * Update vllm to 0.14.1 and numpy to >=2 - Simplify vllm dependency from two platform-specific lines to one - Remove custom aarch64 wheel source (official wheels now available) - Update numpy constraint from <2 to >=2 (required by vllm 0.14.1) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Add --no_auto_dataset_cache to tools test script Fixes launch from macOS where vllm is not available for local caching. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Fix vllm 0.14 import for ToolParser The ToolParser class moved from vllm.entrypoints.openai.tool_parsers to vllm.entrypoints.openai.tool_parsers.abstract_tool_parser in vllm 0.14. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Fix vllm 0.14 tool_parsers module path In vLLM 0.14, tool_parsers was moved from vllm.entrypoints.openai.tool_parsers to vllm.tool_parsers. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * updated changelog * Remove outdated expected data for regeneration after vLLM 0.14 upgrade Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Update expected test data for vLLM 0.14 Regenerated expected output for GPU tests after upgrading vLLM from 0.12.0 to 0.14.1. The new version produces different generation outputs even with the same seed/temperature settings. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Add fallback chat template for vLLM OpenAI server vLLM 0.14 requires a chat template during server warmup. Models without a chat template (like EleutherAI/pythia-14m used in tests) fail with ChatTemplateResolutionError during startup. This adds a simple fallback template that is only used when the model's tokenizer doesn't have a chat template defined. Models with native chat templates (OLMo, Qwen, etc.) are unaffected. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Switch GPU tests to use Qwen/Qwen3-0.6B Replace Qwen/Qwen3-1.7B and EleutherAI/pythia-14m with Qwen/Qwen3-0.6B as the default model for GPU tests. This fixes the test_vllm_queue_system_single_prompt test which was failing because vLLM 0.14 has issues with base models that lack a chat template. Using a smaller Qwen model also speeds up tests while maintaining compatibility with vLLM 0.14's chat template requirements. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Add expected test data for Qwen3-0.6B (with_tools) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Use skipTest instead of fail when generating expected test data This allows all expected data files to be generated in a single test run instead of stopping at the first missing file. Tests are marked as "skipped" when generating data, and will verify on the next run. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Add expected test data for Qwen3-0.6B (without_tools) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Fix DPO debug script by disabling Beaker eval jobs The script has push_to_hub=false which conflicts with the default try_launch_beaker_eval_jobs=true. Explicitly disable eval jobs. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Summary
Upgrades vLLM from 0.12.0 to 0.14.1 for the hybrid model support.
Changes
skipTestinstead offailwhen generating expected test data (allows all data files to be generated in one run)Related PRs
(Replaces #1432 after branch rename)
Test Results
🤖 Generated with Claude Code