Open
Description
Prerequisites
- I have read the ServerlessLLM documentation.
- I have searched the Issue Tracker to ensure this hasn't been reported before.
System Information
OS: Ubuntu
Python: 3.10
GPU: NVIDIA A100
Problem Description
I already patch the vllm.patch, and could successfully run the example code with one GPU.
However, when I try to run with more than one GPU, e.g., two GPUs, with vllm argument: tensor_parallel_size, there are errors in the load_model
Steps to Reproduce
- patch vllm.patch
- start vllm with argument: tensor_parallel_size > 1
- errors in the
load_model
function call
Expected Behavior
No response
Additional Context
No response
Usage Statistics (Optional)
No response
Metadata
Metadata
Assignees
Labels
No labels