-
-
Notifications
You must be signed in to change notification settings - Fork 13.4k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Your current environment
==============================
Versions of relevant libraries
==============================
[pip3] flashinfer-python==0.5.2
[pip3] mypy==1.18.2
[pip3] mypy-extensions==1.1.0
[pip3] numpy==2.2.6
[pip3] nvidia-cublas-cu12==12.8.4.1
[pip3] nvidia-cuda-cupti-cu12==12.8.90
[pip3] nvidia-cuda-nvrtc-cu12==12.8.93
[pip3] nvidia-cuda-runtime-cu12==12.8.90
[pip3] nvidia-cudnn-cu12==9.10.2.21
[pip3] nvidia-cudnn-frontend==1.16.0
[pip3] nvidia-cufft-cu12==11.3.3.83
[pip3] nvidia-cufile-cu12==1.13.1.3
[pip3] nvidia-curand-cu12==10.3.9.90
[pip3] nvidia-cusolver-cu12==11.7.3.90
[pip3] nvidia-cusparse-cu12==12.5.8.93
[pip3] nvidia-cusparselt-cu12==0.7.1
[pip3] nvidia-cutlass-dsl==4.3.0
[pip3] nvidia-ml-py==13.580.82
[pip3] nvidia-nccl-cu12==2.27.5
[pip3] nvidia-nvjitlink-cu12==12.8.93
[pip3] nvidia-nvshmem-cu12==3.3.20
[pip3] nvidia-nvtx-cu12==12.8.90
[pip3] pyzmq==27.1.0
[pip3] torch==2.9.0
[pip3] torch-c-dlpack-ext==0.1.3
[pip3] torchaudio==2.9.0
[pip3] torchvision==0.24.0
[pip3] transformers==4.57.2
[pip3] triton==3.5.0
[conda] Could not collect
🐛 Describe the bug
When using the fastsafetensors plugin, it seems that VRAM usage is significantly increased in tensor parallel mode.
I am unable to load Qwen3-VL-235B-A22B-Instruct-FP8 with the parameters from https://docs.vllm.ai/projects/recipes/en/latest/Qwen/Qwen3-VL.html
vllm serve Qwen/Qwen3-VL-235B-A22B-Instruct-FP8 \
--tensor-parallel-size 4 \
--limit-mm-per-prompt.video 0 \
--async-scheduling \
--gpu-memory-utilization 0.95 \
--max-num-seqs 128
fails when --load-format fastsafetensors is added, on four 96GiB H100s.
Using vllm 0.11.2 and fastsafetensors 0.1.15
I've also tried reducing the context size and max-num-seqs to no avail.
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working