-
-
Notifications
You must be signed in to change notification settings - Fork 9.3k
Pull requests: vllm-project/vllm
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
Correct CUDA Graph capture for encoder-decoder models (V0 engine)
#22630
opened Aug 11, 2025 by
Sugar-zsg
Loading…
4 tasks
[V1] Enable prefill optimization for Gemma3n
speculative-decoding
tpu
Related to Google TPUs
v1
#22628
opened Aug 11, 2025 by
sarckk
Loading…
3 of 4 tasks
Move
SchedulerConfig
from config/__init__.py
to config/scheduler.py
#22626
opened Aug 11, 2025 by
hmellor
Loading…
[Misc] Move jsontree to utils
multi-modality
Related to multi-modality (#4194)
ready
ONLY add when PR is ready to merge/full CI is needed
#22622
opened Aug 11, 2025 by
DarkLight1337
Loading…
1 of 4 tasks
[Bugfix] Fix Dense module loading for sentence-transformers embedding models v3
ci/build
#22614
opened Aug 11, 2025 by
FFFfff1FFFfff
Loading…
[BugFix] [Spec Decode] Remove LlamaForCausalLMEagle3 to fix CI
ci-failure
Issue about an unexpected test failure in CI
llama
Related to Llama models
ready
ONLY add when PR is ready to merge/full CI is needed
speculative-decoding
#22611
opened Aug 11, 2025 by
22quinn
Loading…
3 of 4 tasks
[XPU] Add xpu torch.compile support
ci/build
#22609
opened Aug 11, 2025 by
jikunshang
Loading…
2 of 4 tasks
[Bugfix] Bump DeepGEMM Version to Fix SMXX Layout Issues
ci/build
#22606
opened Aug 10, 2025 by
frankwang28
Loading…
4 tasks done
minor: zero workspace buffer init for flashinfer trtllm-gen attn
v1
#22603
opened Aug 10, 2025 by
yyihuang
Loading…
4 tasks
Vectorize RMSNorm CUDA kernel
performance
Performance-related issues
#22602
opened Aug 10, 2025 by
bbeckca
Loading…
[Feature] Improve logging for error messages
documentation
Improvements or additions to documentation
v1
#22599
opened Aug 10, 2025 by
elizabetht
Loading…
2 of 4 tasks
[V1] [Hybrid] Enable Full CUDA graph by default for models with mamba2 layers in V1
new-model
Requests to new models
ready
ONLY add when PR is ready to merge/full CI is needed
#22594
opened Aug 10, 2025 by
tdoublep
Loading…
3 of 4 tasks
[V1] [Hybrid] Enable compile and piecewise CUDA graph for MiniMax-Text models
#22589
opened Aug 10, 2025 by
tdoublep
Loading…
3 of 4 tasks
Add return_token_ids_alongside parameter to OpenAI API endpoints
frontend
#22587
opened Aug 10, 2025 by
ultmaster
Loading…
4 tasks done
Fix Ray placement group allocation is not respecting env VLLM_RAY_PER_WORKER_GPUS (fractional gpu)
#22577
opened Aug 10, 2025 by
eric-higgins-ai
Loading…
[Core][BugFix] Fix thread safety issue in RequestOutputCollector
ready
ONLY add when PR is ready to merge/full CI is needed
v1
#22576
opened Aug 9, 2025 by
22quinn
Loading…
3 of 4 tasks
optimize: improve scheduler policy lookup performance
v1
#22573
opened Aug 9, 2025 by
skyloevil
Loading…
[Core] Use individual MM items in P0/P1 cache and model runner
multi-modality
Related to multi-modality (#4194)
ready
ONLY add when PR is ready to merge/full CI is needed
tpu
Related to Google TPUs
v1
#22570
opened Aug 9, 2025 by
DarkLight1337
Loading…
1 of 4 tasks
Previous Next
ProTip!
Find all pull requests that aren't related to any open issues with -linked:issue.