Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
122 commits
Select commit Hold shift + click to select a range
657e9c0
[Fix] Introduce audio channels spec (#31595)
jeremyteboul Jan 9, 2026
a4d5d66
Add unpermute-aware fused MoE path and small-batch fallback (#29354)
RunkaiTao Jan 9, 2026
f9e2a75
[fix] add cutedsl to global sf (#32001)
jiahanc Jan 9, 2026
0a0aa07
[Quant] Make static quant support all group shapes (#30833)
LucasWilkinson Jan 9, 2026
1f8b7c5
[responsesAPI] fix incomplete_messages for simple/parsable context (#…
qandrew Jan 9, 2026
2612ba9
[1/N][Attention] Restructure attention: move files (#31916)
MatthewBonanni Jan 9, 2026
9457812
[NIXL] refine decoder side post process for heterogeneous BlockSize a…
xuechendi Jan 9, 2026
97ba96f
[perf][async] support non cpu sync get logprob tensors for spec (#31336)
izhuhaoran Jan 9, 2026
3adffd5
[Misc] Enable async scheduling by default with spec decoding (#31998)
njhill Jan 9, 2026
aaf4b70
[Misc][BE] Type coverage for vllm/compilation [2/3] (#31744)
Lucaskabela Jan 9, 2026
0308901
[2/N][Attention] Fix pre-commit errors (#32052)
MatthewBonanni Jan 10, 2026
1963245
[Core] Use weights_only=True with torch.load (#32045)
russellb Jan 10, 2026
e18464a
[Perf] Optimize async scheduling placeholder using empty (#32056)
yewentao256 Jan 10, 2026
ac0675f
[CI] Allow Deprecated Quantization For LM Eval Tests (#32065)
micah-wil Jan 10, 2026
4dc0d60
[Bugfix] Narrow broad exceptions in compilation backends (#31616)
c0de128 Jan 10, 2026
abd9224
resolve pydantic error in startup benchmark (#31348)
andyxning Jan 10, 2026
ea6d067
[Misc][LLaMa4] Compile LLaMa Vision Encoder (#30709)
Lucaskabela Jan 10, 2026
e45946b
feature/issac 0.2 (#31550)
AkshatSh Jan 10, 2026
80fead8
Fuse RoPE and MLA KV-cache write (#25774)
PatrykSaffer Jan 10, 2026
c60578d
[Bugfix][Hardware][AMD] Use dynamic WARP_SIZE in sampler vectorized_p…
c0de128 Jan 10, 2026
52d4282
[Core] Refactor ColumnParallelLinear: remove unused parameter and opt…
maang-h Jan 10, 2026
583a90e
[Refactor] Separate sequence and token pooling types (#32026)
DarkLight1337 Jan 10, 2026
0c96148
Update modelopt KV cache quantization resolution to new scheme (#31895)
roikoren755 Jan 10, 2026
d83becd
[ROCm][CI] Fix flaky `test_function_calling_with_stream` and reduce s…
AndreasKaratzas Jan 10, 2026
da6709c
[Misc] Delay deprecation of CommonAttentionMetadata properties (#32074)
LucasWilkinson Jan 10, 2026
a01a1c0
[Bugfix] fix encoder cache leak of waiting requests in scheduler to s…
frelam Jan 10, 2026
5f2385a
[Benchmark][1/2] Generalize SLA criterion validation from binary flag…
DarkLight1337 Jan 10, 2026
14fc7a6
[Bugfix] fix offline chat output prompt (#32076)
andyxning Jan 10, 2026
07286ec
[Bugfix] Fix integer overflow in Gemma3n audio processing (#31657)
jeremyteboul Jan 10, 2026
e6c6f2c
[Quant] Support MXFP4 W4A16 for compressed-tensors dense models (#31926)
mgoin Jan 10, 2026
b8bf5c4
[Kernel] Optimize Sliding Window Attention in 3D Triton Kernel (#31984)
jvlunteren Jan 10, 2026
543c23b
[LoRA][Perf] Improve FusedMoE LoRA performance for small rank (#32019)
xyang16 Jan 10, 2026
d1fd802
fused_moe_kernel - cast accumulator after applying router weights (#3…
gnovack Jan 10, 2026
0285997
[BugFix] scheduler: Fix resuming of preempted requests after async lo…
orozery Jan 10, 2026
1c46dea
Revert "[Kernels][FI] Skip trtllm attention when num_kv_heads=1 (#308…
shyeh25 Jan 10, 2026
6ea001c
[Bugfix][Quantization] Ensure input contiguity in per_token_quant_int…
Flink-ddd Jan 10, 2026
e15a5ff
[MISC] Add strict contiguity check for FlashInfer attention tensors (…
vadiklyutiy Jan 10, 2026
8020a60
[Bugfix] Fix Qwen3-VL-Reranker model loading for sequence classificat…
ricky-chaoju Jan 10, 2026
2a4dbe2
[BugFix] Wait for compute before offloading KV to CPU (#31341)
orozery Jan 10, 2026
ef96fa3
[Benchmark][2/2] Use spline interpolation to tune SLA variables (#32095)
DarkLight1337 Jan 11, 2026
0dd6363
[MTP][GLM][Bugfix] Fixed .weight_scale loading logic that dropped MTP…
andyl98 Jan 11, 2026
46eb30f
make assume_32_bit_indexing configurable (#32044)
laithsakka Jan 11, 2026
9103ed1
[CPU][BugFix] Disable AOT Compile for CPU (#32037)
fadara01 Jan 11, 2026
bde57ab
[Hardware][AMD][CI][Bugfix] Fix AMD Quantization test group (#31713)
mawong-amd Jan 11, 2026
4c16ba6
[KVConnector] OffloadingConnector: Fix bug in handling of preemptions…
orozery Jan 11, 2026
cee7436
[Misc] Make `scipy` as optional audio/benchmark dependency (#32096)
Isotr0py Jan 11, 2026
a374532
[CI/Build] Separate out flaky responses API tests (#32110)
DarkLight1337 Jan 11, 2026
d70249e
[Misc] fix this log format not space (#32112)
lengrongfu Jan 11, 2026
a34abc4
[FixBug] Improve exception string in `tensorizer.py` (#31680)
maang-h Jan 11, 2026
d74132c
fix offline inference chat response prompt (#32088)
andyxning Jan 11, 2026
3df619a
[CI] fix `test_concat_and_cache_mla_rope_fused` (#32117)
ZJY0516 Jan 11, 2026
19504ac
[Model Runner V2] Skip building deprecated fields in attn metadata (#…
WoosukKwon Jan 11, 2026
025a32f
[Model Runner V2] Remove async barrier (#32083)
WoosukKwon Jan 12, 2026
9101dc7
[Model] Avoid hardcoding pooling type (#32119)
DarkLight1337 Jan 12, 2026
60446cd
[Model] Improve multimodal pooling examples (#32085)
noooop Jan 12, 2026
600aaab
[Model] Remove incorrect `SupportsPP` from MTP models (#32150)
DarkLight1337 Jan 12, 2026
22970c1
[Misc] Disable default `--ready-check-timeout-sec` extra call in vllm…
NickLucche Jan 12, 2026
5e034f2
[cpu][bench] Add Fused MoE Micro Benchmark for CPU Backend (#32092)
andikarachman Jan 12, 2026
d7b2e57
[Frontend] Fix Flaky MCP Streaming Test (#32153)
daniel-salib Jan 12, 2026
899541b
[doc] fix broken links (#32158)
minimAluminiumalism Jan 12, 2026
05e8981
[Doc] Improve LoRA docs (#32159)
jeejeelee Jan 12, 2026
a5f89ae
[Doc] Add documentation for offline API docs feature (#32134)
ricky-chaoju Jan 12, 2026
9dbe1fe
[Bugfix] Fix missing scale passing for encoder Triton Attention imple…
Isotr0py Jan 12, 2026
0565f1f
[P/D] Refactor mooncake connector sender thread using async coroutine…
dtcccc Jan 12, 2026
49e6b86
[Feature] Support recording expert indices for rollout router replay …
xhx1022 Jan 12, 2026
9cddbdb
OffloadingConnector: Add cpu_bytes_to_use configuration (#24498)
orozery Jan 12, 2026
e68b0da
doc: Update model name for Qwen3-Coder in documentation (#32185)
andyzhangx Jan 12, 2026
0346396
[ROCm] [Bugfix] Fix order of mori build in Dockerfile.rocm_base (#32179)
tjtanaa Jan 12, 2026
95e53d9
doc: Update model references in supported_models.md (#32188)
andyzhangx Jan 12, 2026
63ed240
Add K-EXAONE-236B-A23B (#31621)
lkm2835 Jan 12, 2026
6bc9c84
[MODEL] New model support for kakaocorp/kanana-1.5-v-3b-instruct (#29…
kakao-steve-ai Jan 12, 2026
3f72639
[FIX] Add NO_MUL activation support for modular kernel path (#31528)
danielafrimi Jan 12, 2026
8863c2b
[Model] Standardize pooling heads (#32148)
DarkLight1337 Jan 12, 2026
8fb2c13
[Bugfix] Fix stale SSM state for new Mamba requests scheduled as deco…
Josephasafg Jan 12, 2026
5b68107
[Misc][PD] Fix `get_attn_backend` usage in transfer connectors (#31988)
NickLucche Jan 12, 2026
7c0d3c5
[Benchmark] Share data between SLA runs (#32184)
DarkLight1337 Jan 12, 2026
20228cb
[3/N][Attention] Move AttentionMetadata-related code from utils.py to…
MatthewBonanni Jan 12, 2026
3d962d7
[BugFix] fix FusedMoE.make_expert_params_mapping in EXAONE-MoE (#32196)
lkm2835 Jan 12, 2026
1eb61ab
[Refactor] EPLB rebalance algo to NumPy (#30697)
ilmarkov Jan 12, 2026
16abe6b
[Misc] Set default torch num threads for input processing (#31879)
ywang96 Jan 12, 2026
2be765b
[BugFix] scheduler: Fix ordering preserving of skipped requests (#32173)
orozery Jan 12, 2026
08e8e99
[Misc] Change log level for batch queue log (#32192)
NickLucche Jan 12, 2026
ad8818b
[Misc][BE] Type coverage for vllm/compilation [3/3] (#31748)
Lucaskabela Jan 12, 2026
ca81811
[Model Runner V2] Support logit_bias, allowed_token_ids, min_tokens (…
WoosukKwon Jan 12, 2026
f8bd839
[NIXL][Bugfix] Failure logging overhaul + early metadata free on fail…
NickLucche Jan 12, 2026
9f430c9
[BUGFIX] Add missed remaping of the names of fp8 kv-scale (#32199)
vadiklyutiy Jan 12, 2026
dec2868
[Model Runner V2] Minor refactor for logit_bias (#32209)
WoosukKwon Jan 12, 2026
0a7dd23
[Model Runner V2] Add support for M-RoPE (#32143)
WoosukKwon Jan 12, 2026
629584b
[Kernel][MoE] fix computation order of MoE weight multiplication and …
xuebwang-amd Jan 12, 2026
a28d9f4
[ROCm][CI] Handle pytest status code 5 when a shard isn't allocated a…
divakar-amd Jan 12, 2026
a307ac0
[responsesAPI] add unit test for optional function tool call id (#32036)
qandrew Jan 13, 2026
78d13ea
[Model] Handle `trust_remote_code` for transformers backend (#32194)
DarkLight1337 Jan 13, 2026
9273a42
[Misc] Allow enabling NCCL for DP sync when async scheduling (#32197)
njhill Jan 13, 2026
c6bb5b5
[BugFix] Fix engine crash caused by chat tools + response_format (#32…
njhill Jan 13, 2026
15b33ff
[Misc] improve warning/assert messages (#32226)
cjackal Jan 13, 2026
60b77e1
[Frontend] Add `reasoning_effort` to `OpenAIServing._preprocess_chat(…
sanghoon-yn Jan 13, 2026
f243abc
Fix various typos found in `docs` (#32212)
potatosalad Jan 13, 2026
2a719e0
[Perf] Optimize requests abort (#32211)
yewentao256 Jan 13, 2026
11b6af5
[ROCm][Bugfix] Fix Mamba batched decode producing incorrect output (#…
AndreasKaratzas Jan 13, 2026
5e714f7
[ROCm][CI] Fix HuggingFace flash_attention_2 accuracy issue in Isaac …
AndreasKaratzas Jan 13, 2026
80221e1
[BugFix]Fix eagle draft_model_config and add tests (#31753)
charlotte12l Jan 13, 2026
44c34f2
[Doc] Update installation from source command (#32239)
esmeetu Jan 13, 2026
df7e127
[ROCm][CI] Fix engine core client tests for ROCm spawn multiprocessin…
AndreasKaratzas Jan 13, 2026
542a405
[Model] Use mm_position to compute mrope positions for Qwen2-VL/2.5-V…
YunzhuLu Jan 13, 2026
eb28e80
[Refactor] Remove `get_encoder_dummy_data` (#32241)
DarkLight1337 Jan 13, 2026
232214b
[Bugfix] Replace `PoolingParams.normalize` with `use_activation` (#32…
DarkLight1337 Jan 13, 2026
8c8653b
[Docs] Nixl Usage recommend `fail` kv_load_failure_policy (#32198)
NickLucche Jan 13, 2026
a5bbbd2
[Quantization] fix: overflow with static per-tensor scaling (#29867)
mickaelseznec Jan 13, 2026
fefce49
[Refactor] [6/N] to simplify the vLLM openai chat_completion serving …
chaunceyjiang Jan 13, 2026
98f60e5
[6/N][Attention] Move utils to more appropriate locations (#32215)
MatthewBonanni Jan 13, 2026
252c011
[Refactor] Remove `MultiModalProfiler` (#32254)
DarkLight1337 Jan 13, 2026
4f02cb2
[Refactor] [7/N] to simplify the vLLM lora serving architecture (#32251)
chaunceyjiang Jan 13, 2026
5102654
[BugFix] [KVConnector] Fix KV events for LMCache connector (#32169)
hickeyma Jan 13, 2026
4f3676e
nixl_connector: export UCX_MEM_MMAP_HOOK_MODE=none to avoid a UCX mem…
hasB4K Jan 13, 2026
2263d44
[4/N][Attention] Move MLA common to model_executor (#32060)
MatthewBonanni Jan 13, 2026
ab74b2a
[Trivial] Remove duplicate enable_mfu_metrics (#32246)
markmc Jan 13, 2026
6beef12
[EPLB][Cleanup] Remove `is_async_enabled` from `EplbModelState` (#32050)
SageMoore Jan 13, 2026
af54d2e
[responseAPI] support partial message generation (#32100)
qandrew Jan 13, 2026
46f8c6b
Fix CUDA 13 wheel installation doc (#32276)
dmitry-tokarev-nv Jan 13, 2026
f28125d
[Perf] Optimize grouped topk kernel, 1.2%~2% E2E Throughput improveme…
yewentao256 Jan 13, 2026
69f8a0e
fix(rocm): Use refresh_env_variables() for rocm_aiter_ops in test_moe…
rabi Jan 13, 2026
af677f5
add mxfp4 ct moe support
dsikka Jan 9, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
1 change: 1 addition & 0 deletions .buildkite/lm-eval-harness/test_lm_eval_correctness.py
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,7 @@ def launch_lm_eval(eval_config, tp_size):
f"add_bos_token=true,"
f"trust_remote_code={trust_remote_code},"
f"max_model_len={max_model_len},"
"allow_deprecated_quantization=True,"
)

env_vars = eval_config.get("env_vars", None)
Expand Down
11 changes: 10 additions & 1 deletion .buildkite/scripts/hardware_ci/run-amd-test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -209,12 +209,21 @@ if [[ $commands == *"--shard-id="* ]]; then
wait "${pid}"
STATUS+=($?)
done
at_least_one_shard_with_tests=0
for st in "${STATUS[@]}"; do
if [[ ${st} -ne 0 ]]; then
if [[ ${st} -ne 0 ]] && [[ ${st} -ne 5 ]]; then
echo "One of the processes failed with $st"
exit "${st}"
elif [[ ${st} -eq 5 ]]; then
echo "Shard exited with status 5 (no tests collected) - treating as success"
else # This means st is 0
at_least_one_shard_with_tests=1
fi
done
if [[ ${#STATUS[@]} -gt 0 && ${at_least_one_shard_with_tests} -eq 0 ]]; then
echo "All shards reported no tests collected. Failing the build."
exit 1
fi
else
echo "Render devices: $BUILDKITE_AGENT_META_DATA_RENDER_DEVICES"
docker run \
Expand Down
24 changes: 19 additions & 5 deletions .buildkite/test-amd.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -162,8 +162,7 @@ steps:
- tests/entrypoints/test_chat_utils
commands:
- export VLLM_WORKER_MULTIPROC_METHOD=spawn
- pytest -v -s entrypoints/openai --ignore=entrypoints/openai/test_chat_with_tool_reasoning.py --ignore=entrypoints/openai/test_oot_registration.py --ignore=entrypoints/openai/test_tensorizer_entrypoint.py --ignore=entrypoints/openai/correctness/ --ignore=entrypoints/openai/tool_parsers/ --ignore=entrypoints/openai/test_vision_embeds.py
- pytest -v -s entrypoints/openai/test_vision_embeds.py
- pytest -v -s entrypoints/openai --ignore=entrypoints/openai/test_chat_with_tool_reasoning.py --ignore=entrypoints/openai/test_oot_registration.py --ignore=entrypoints/openai/test_tensorizer_entrypoint.py --ignore=entrypoints/openai/correctness/ --ignore=entrypoints/openai/tool_parsers/ --ignore=entrypoints/openai/responses
- pytest -v -s entrypoints/test_chat_utils.py

- label: Entrypoints Integration Test (API Server 2)
Expand Down Expand Up @@ -200,6 +199,21 @@ steps:
- export VLLM_WORKER_MULTIPROC_METHOD=spawn
- pytest -v -s entrypoints/pooling

- label: Entrypoints Integration Test (Responses API)
timeout_in_minutes: 50
mirror_hardwares: [amdexperimental]
agent_pool: mi325_1
# grade: Blocking
working_dir: "/vllm-workspace/tests"
fast_check: true
torch_nightly: true
source_file_dependencies:
- vllm/
- tests/entrypoints/openai/responses
commands:
- export VLLM_WORKER_MULTIPROC_METHOD=spawn
- pytest -v -s entrypoints/openai/responses

- label: Distributed Tests (4 GPUs) # 35min
timeout_in_minutes: 50
mirror_hardwares: [amdexperimental]
Expand Down Expand Up @@ -731,7 +745,7 @@ steps:

- label: Quantization Test # 70min
timeout_in_minutes: 90
mirror_hardwares: [amdexperimental]
mirror_hardwares: [amdexperimental, amdproduction]
agent_pool: mi325_1
# grade: Blocking
source_file_dependencies:
Expand Down Expand Up @@ -856,7 +870,7 @@ steps:
- label: Language Models Tests (Extra Standard) %N
timeout_in_minutes: 45
mirror_hardwares: [amdexperimental]
agent_pool: mi325_2
agent_pool: mi325_8
# grade: Blocking
torch_nightly: true
source_file_dependencies:
Expand Down Expand Up @@ -1105,8 +1119,8 @@ steps:
- vllm/v1/attention/backends/flashinfer.py
- vllm/v1/attention/backends/mla/cutlass_mla.py
- vllm/v1/attention/backends/mla/flashinfer_mla.py
- vllm/v1/attention/selector.py
- vllm/platforms/cuda.py
- vllm/attention/selector.py
commands:
- nvidia-smi
- python3 examples/offline_inference/basic/chat.py
Expand Down
16 changes: 14 additions & 2 deletions .buildkite/test-pipeline.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -144,7 +144,7 @@ steps:
- tests/entrypoints/test_chat_utils
commands:
- export VLLM_WORKER_MULTIPROC_METHOD=spawn
- pytest -v -s entrypoints/openai --ignore=entrypoints/openai/test_chat_with_tool_reasoning.py --ignore=entrypoints/openai/test_oot_registration.py --ignore=entrypoints/openai/test_tensorizer_entrypoint.py --ignore=entrypoints/openai/correctness/ --ignore=entrypoints/openai/tool_parsers/
- pytest -v -s entrypoints/openai --ignore=entrypoints/openai/test_chat_with_tool_reasoning.py --ignore=entrypoints/openai/test_oot_registration.py --ignore=entrypoints/openai/test_tensorizer_entrypoint.py --ignore=entrypoints/openai/correctness/ --ignore=entrypoints/openai/tool_parsers/ --ignore=entrypoints/openai/responses
- pytest -v -s entrypoints/test_chat_utils.py

- label: Entrypoints Integration Test (API Server 2)
Expand Down Expand Up @@ -177,6 +177,18 @@ steps:
- export VLLM_WORKER_MULTIPROC_METHOD=spawn
- pytest -v -s entrypoints/pooling

- label: Entrypoints Integration Test (Responses API)
timeout_in_minutes: 50
mirror_hardwares: [amdexperimental]
working_dir: "/vllm-workspace/tests"
fast_check: true
torch_nightly: true
source_file_dependencies:
- vllm/
- tests/entrypoints/openai/responses
commands:
- pytest -v -s entrypoints/openai/responses

- label: Distributed Tests (4 GPUs) # 35min
timeout_in_minutes: 50
mirror_hardwares: [amdexperimental]
Expand Down Expand Up @@ -954,8 +966,8 @@ steps:
- vllm/v1/attention/backends/flashinfer.py
- vllm/v1/attention/backends/mla/cutlass_mla.py
- vllm/v1/attention/backends/mla/flashinfer_mla.py
- vllm/v1/attention/selector.py
- vllm/platforms/cuda.py
- vllm/attention/selector.py
commands:
- nvidia-smi
- python3 examples/offline_inference/basic/chat.py
Expand Down
11 changes: 9 additions & 2 deletions .buildkite/test_areas/entrypoints.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -34,10 +34,9 @@ steps:
- tests/entrypoints/test_chat_utils
commands:
- export VLLM_WORKER_MULTIPROC_METHOD=spawn
- pytest -v -s entrypoints/openai --ignore=entrypoints/openai/test_chat_with_tool_reasoning.py --ignore=entrypoints/openai/test_oot_registration.py --ignore=entrypoints/openai/test_tensorizer_entrypoint.py --ignore=entrypoints/openai/correctness/ --ignore=entrypoints/openai/tool_parsers/
- pytest -v -s entrypoints/openai --ignore=entrypoints/openai/test_chat_with_tool_reasoning.py --ignore=entrypoints/openai/test_oot_registration.py --ignore=entrypoints/openai/test_tensorizer_entrypoint.py --ignore=entrypoints/openai/correctness/ --ignore=entrypoints/openai/tool_parsers/ --ignore=entrypoints/openai/responses
- pytest -v -s entrypoints/test_chat_utils.py


- label: Entrypoints Integration (API Server 2)
timeout_in_minutes: 130
working_dir: "/vllm-workspace/tests"
Expand All @@ -64,6 +63,14 @@ steps:
- export VLLM_WORKER_MULTIPROC_METHOD=spawn
- pytest -v -s entrypoints/pooling

- label: Entrypoints Integration (Responses API)
timeout_in_minutes: 50
working_dir: "/vllm-workspace/tests"
source_file_dependencies:
- vllm/
- tests/entrypoints/openai/responses
commands:
- pytest -v -s entrypoints/openai/responses

- label: Entrypoints V1
timeout_in_minutes: 50
Expand Down
2 changes: 1 addition & 1 deletion .buildkite/test_areas/kernels.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -90,8 +90,8 @@ steps:
- vllm/v1/attention/backends/flashinfer.py
- vllm/v1/attention/backends/mla/cutlass_mla.py
- vllm/v1/attention/backends/mla/flashinfer_mla.py
- vllm/v1/attention/selector.py
- vllm/platforms/cuda.py
- vllm/attention/selector.py
commands:
- nvidia-smi
- python3 examples/offline_inference/basic/chat.py
Expand Down
8 changes: 4 additions & 4 deletions .github/CODEOWNERS
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@

# This lists cover the "core" components of vLLM that require careful review
/vllm/attention @LucasWilkinson
/vllm/attention/backends/abstract.py @WoosukKwon @zhuohan123 @youkaichao @alexm-redhat @njhill
/vllm/executor/executor_base.py @zhuohan123 @youkaichao @alexm-redhat @njhill @22quinn
/vllm/model_executor/layers/fused_moe @mgoin @pavanimajety
/vllm/model_executor/layers/quantization @mgoin @robertgshaw2-redhat @tlrmchlsmth @yewentao256 @pavanimajety
Expand All @@ -27,6 +26,7 @@ CMakeLists.txt @tlrmchlsmth @LucasWilkinson

# vLLM V1
/vllm/v1/attention @LucasWilkinson
/vllm/v1/attention/backend.py @WoosukKwon @zhuohan123 @youkaichao @alexm-redhat @njhill
/vllm/v1/attention/backends/mla @pavanimajety
/vllm/v1/attention/backends/flashinfer.py @mgoin @pavanimajety
/vllm/v1/attention/backends/triton_attn.py @tdoublep
Expand Down Expand Up @@ -117,15 +117,15 @@ mkdocs.yaml @hmellor
/vllm/transformers_utils/tokenizers/mistral.py @patrickvonplaten

# Kernels
/vllm/attention/ops/chunked_prefill_paged_decode.py @tdoublep
/vllm/attention/ops/triton_unified_attention.py @tdoublep
/vllm/v1/attention/ops/chunked_prefill_paged_decode.py @tdoublep
/vllm/v1/attention/ops/triton_unified_attention.py @tdoublep

# ROCm related: specify owner with write access to notify AMD folks for careful code review
/vllm/**/*rocm* @tjtanaa
/docker/Dockerfile.rocm* @gshtras @tjtanaa
/vllm/v1/attention/backends/rocm*.py @gshtras @tjtanaa
/vllm/v1/attention/backends/mla/rocm*.py @gshtras @tjtanaa
/vllm/attention/ops/rocm*.py @gshtras @tjtanaa
/vllm/v1/attention/ops/rocm*.py @gshtras @tjtanaa
/vllm/model_executor/layers/fused_moe/rocm*.py @gshtras @tjtanaa
/csrc/rocm @gshtras @tjtanaa
/requirements/*rocm* @tjtanaa
Expand Down
4 changes: 2 additions & 2 deletions .github/mergify.yml
Original file line number Diff line number Diff line change
Expand Up @@ -222,10 +222,10 @@ pull_request_rules:
- files~=^csrc/rocm/
- files~=^docker/Dockerfile.rocm
- files~=^requirements/rocm.*\.txt
- files~=^vllm/attention/backends/rocm.*\.py
- files~=^vllm/attention/ops/rocm.*\.py
- files~=^vllm/model_executor/layers/fused_moe/rocm.*\.py
- files~=^vllm/v1/attention/backends/rocm.*\.py
- files~=^vllm/v1/attention/backends/mla/rocm.*\.py
- files~=^vllm/v1/attention/ops/rocm.*\.py
- files~=^tests/kernels/.*_rocm.*\.py
- files=vllm/platforms/rocm.py
- title~=(?i)AMD
Expand Down
1 change: 1 addition & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -282,6 +282,7 @@ endif()
set(VLLM_EXT_SRC
"csrc/mamba/mamba_ssm/selective_scan_fwd.cu"
"csrc/cache_kernels.cu"
"csrc/cache_kernels_fused.cu"
"csrc/attention/paged_attention_v1.cu"
"csrc/attention/paged_attention_v2.cu"
"csrc/attention/merge_attn_states.cu"
Expand Down
6 changes: 3 additions & 3 deletions benchmarks/kernels/benchmark_reshape_and_cache_flash.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,16 +7,16 @@
from tabulate import tabulate

from vllm import _custom_ops as ops
from vllm.attention.ops.triton_reshape_and_cache_flash import (
triton_reshape_and_cache_flash,
)
from vllm.logger import init_logger
from vllm.utils.argparse_utils import FlexibleArgumentParser
from vllm.utils.torch_utils import (
STR_DTYPE_TO_TORCH_DTYPE,
create_kv_caches_with_random_flash,
set_random_seed,
)
from vllm.v1.attention.ops.triton_reshape_and_cache_flash import (
triton_reshape_and_cache_flash,
)

logger = init_logger(__name__)

Expand Down
Loading