Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

[BUG] Does not support using multiple GPUs in current vLLM patch #157

Copy link
Copy link
Open
@attteegood

Description

@attteegood
Issue body actions

Prerequisites

System Information

OS: Ubuntu
Python: 3.10
GPU: NVIDIA A100

Problem Description

I already patch the vllm.patch, and could successfully run the example code with one GPU.
However, when I try to run with more than one GPU, e.g., two GPUs, with vllm argument: tensor_parallel_size, there are errors in the load_model

Steps to Reproduce

  1. patch vllm.patch
  2. start vllm with argument: tensor_parallel_size > 1
  3. errors in the load_model function call

Expected Behavior

No response

Additional Context

No response

Usage Statistics (Optional)

No response

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions

    Morty Proxy This is a proxified and sanitized view of the page, visit original site.