[BUG] Does not support using multiple GPUs in current vLLM patch

Prerequisites

I have read the ServerlessLLM documentation.
I have searched the Issue Tracker to ensure this hasn't been reported before.

System Information

OS: Ubuntu
Python: 3.10
GPU: NVIDIA A100

Problem Description

I already patch the vllm.patch, and could successfully run the example code with one GPU.
However, when I try to run with more than one GPU, e.g., two GPUs, with vllm argument: tensor_parallel_size, there are errors in the load_model

Steps to Reproduce

patch vllm.patch
start vllm with argument: tensor_parallel_size > 1
errors in the load_model function call

Expected Behavior

No response

Additional Context

No response

Usage Statistics (Optional)

No response

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Does not support using multiple GPUs in current vLLM patch #157

Prerequisites

System Information

Problem Description

Steps to Reproduce

Expected Behavior

Additional Context

Usage Statistics (Optional)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Search code, repositories, users, issues, pull requests...

[BUG] Does not support using multiple GPUs in current vLLM patch #157

Description

Prerequisites

System Information

Problem Description

Steps to Reproduce

Expected Behavior

Additional Context

Usage Statistics (Optional)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions