Open
Description
Prerequisites
- I have read the ServerlessLLM documentation.
- I have searched the Issue Tracker to ensure this hasn't been reported before.
System Information
2 NV A100 GPUs
Problem Description
When use vLLM and set tensor_parallel_size = 2, failing to load model checkpoints.
Steps to Reproduce
error "Exception in worker VllmWorkerProcess while processing method load_model"
Expected Behavior
No response
Additional Context
No response
Usage Statistics (Optional)
No response
Metadata
Metadata
Assignees
Labels
No labels