Skip to content

Pull requests: vllm-project/vllm

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

Correct CUDA Graph capture for encoder-decoder models (V0 engine)
#22630 opened Aug 11, 2025 by Sugar-zsg Loading…
4 tasks
[V1] Enable prefill optimization for Gemma3n speculative-decoding tpu Related to Google TPUs v1
#22628 opened Aug 11, 2025 by sarckk Loading…
3 of 4 tasks
Support Anthropic API Endponit frontend
#22627 opened Aug 11, 2025 by LiuLi1998 Loading…
[Misc] Move jsontree to utils multi-modality Related to multi-modality (#4194) ready ONLY add when PR is ready to merge/full CI is needed
#22622 opened Aug 11, 2025 by DarkLight1337 Loading…
1 of 4 tasks
[feat] added the optimized config for Qwen3-30B-A3B Fp8 qwen Related to Qwen models
#22618 opened Aug 11, 2025 by sara4dev Draft
4 tasks
Add EXAONE 4.0 reasoning parser deepseek Related to DeepSeek models frontend qwen Related to Qwen models
#22617 opened Aug 11, 2025 by nuxlear Loading…
3 of 4 tasks
Upgrade FlashInfer to v0.2.11 ci/build
#22613 opened Aug 11, 2025 by nvpohanh Loading…
3 of 4 tasks
[BugFix] [Spec Decode] Remove LlamaForCausalLMEagle3 to fix CI ci-failure Issue about an unexpected test failure in CI llama Related to Llama models ready ONLY add when PR is ready to merge/full CI is needed speculative-decoding
#22611 opened Aug 11, 2025 by 22quinn Loading…
3 of 4 tasks
[XPU] Add xpu torch.compile support ci/build
#22609 opened Aug 11, 2025 by jikunshang Loading…
2 of 4 tasks
[Bugfix] Bump DeepGEMM Version to Fix SMXX Layout Issues ci/build
#22606 opened Aug 10, 2025 by frankwang28 Loading…
4 tasks done
Vectorize RMSNorm CUDA kernel performance Performance-related issues
#22602 opened Aug 10, 2025 by bbeckca Loading…
[Feature] Improve logging for error messages documentation Improvements or additions to documentation v1
#22599 opened Aug 10, 2025 by elizabetht Loading…
2 of 4 tasks
v1: Offloading connector ci/build v1
#22595 opened Aug 10, 2025 by orozery Loading…
[V1] [Hybrid] Enable Full CUDA graph by default for models with mamba2 layers in V1 new-model Requests to new models ready ONLY add when PR is ready to merge/full CI is needed
#22594 opened Aug 10, 2025 by tdoublep Loading…
3 of 4 tasks
[Core][BugFix] Fix thread safety issue in RequestOutputCollector ready ONLY add when PR is ready to merge/full CI is needed v1
#22576 opened Aug 9, 2025 by 22quinn Loading…
3 of 4 tasks
[Core] Use individual MM items in P0/P1 cache and model runner multi-modality Related to multi-modality (#4194) ready ONLY add when PR is ready to merge/full CI is needed tpu Related to Google TPUs v1
#22570 opened Aug 9, 2025 by DarkLight1337 Loading…
1 of 4 tasks
ProTip! Find all pull requests that aren't related to any open issues with -linked:issue.