Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
60 commits
Select commit Hold shift + click to select a range
9705fba
[cpu][perf] Accelerate unquantized-linear for AArch64 through oneDNN/…
fadara01 Oct 4, 2025
ea507c3
[V1] [Hybrid] Mamba2 Automatic Prefix Caching (#25752)
s3woz Oct 4, 2025
d3d649e
Support expert parallel in Transformers backend (#26162)
hmellor Oct 4, 2025
44ea851
[Model] Support nested structures for TensorSchema (#26212)
DarkLight1337 Oct 4, 2025
736fbf4
[Misc] Require `merge_by_field_config` argument (#26214)
DarkLight1337 Oct 4, 2025
7c2e91c
[Misc] Remove unused `executor.apply_model` (#26215)
DarkLight1337 Oct 4, 2025
7d6b033
[CI Failure] fix_test_auto_prefix_cache_support (#26053)
hl475 Oct 4, 2025
1838cd4
Revert "Add batch invariant kernel override for FlashInfer backend [2…
DarkLight1337 Oct 4, 2025
d0df145
Add Olmo 3 reasoning parser (#26054)
soldni Oct 4, 2025
f05fea1
[Core] Enable decode of context length equal to max model length (#26…
yannicks1 Oct 4, 2025
2a6dc67
[Bugfix] Fix `_reqs_to_process` leak on abort (#26012)
NickLucche Oct 4, 2025
4570535
[Model] CLIP Embedding Support (#26010)
DarkLight1337 Oct 4, 2025
86ee949
Fix tensor device and dtype placement in Qwen2VL model (#26219)
yuafng Oct 4, 2025
ed3aeb2
[V1] [Hybrid] Remove code to override default CUDA graph configuratio…
tdoublep Oct 4, 2025
5c057e0
[CPU] Refine batch reorder of CPU attention backend (#26096)
bigPYJ1151 Oct 4, 2025
a42d2df
[Frontend] Cache chat template kwargs resolution (#26227)
Isotr0py Oct 4, 2025
119f006
[Renderer] Clean up renderer code (#26216)
DarkLight1337 Oct 4, 2025
59a85c3
[Model] Use `merge_by_field_config` for MM models (H-L) (#26230)
DarkLight1337 Oct 5, 2025
78c1d5b
[Easy] Add str repr for IterationStats (#26232)
22quinn Oct 5, 2025
a964e5e
[Bugfix] Allow `--skip-tokenizer-init` with `echo and return_token_id…
DarkLight1337 Oct 5, 2025
e0986ea
Add documentation for granite 4 tool calling (#26175)
maxdebayser Oct 5, 2025
201c971
[Perf][Easy] Early stop in request_block_hasher (#26112)
Jialin Oct 5, 2025
432e1cb
[Bugfix]: Assertion error when using FlashInfer backend (#25933)
simondanielsson Oct 5, 2025
b7e8e4e
[Bugfix] Always apply MM processor even when no MM items are passed (…
DarkLight1337 Oct 5, 2025
3303cfb
[Bugfix][Hardware][RISC-V] Limit supported dtypes to float32 to avoid…
ihb2032 Oct 5, 2025
17edd8a
[Platform][Kernel] platform-specific kernel loading (#25823)
ILikeIneine Oct 5, 2025
d6953be
Convert formatting to use `ruff` instead of `yapf` + `isort` (#26247)
hmellor Oct 5, 2025
4e256ca
Remove all references to `yapf` as it's no longer used (#26251)
hmellor Oct 5, 2025
557b2e9
Remove all cases of `fmt: on/off` (#26253)
hmellor Oct 5, 2025
5f31753
fix(tests): Resolve late binding of loop variable in assert message l…
ihb2032 Oct 5, 2025
1c0c682
Fix per file ruff ignores related to typing (#26254)
hmellor Oct 5, 2025
512b8af
Update `ruff` pre-commit hooks version (#26255)
hmellor Oct 5, 2025
9c3c21c
[CI] fix mamba kernel test (#26250)
ZJY0516 Oct 5, 2025
6b6e987
[NVIDIA] flashinfer TRTLLM attention prefill token limit (#25998)
jasonlizhengjian Oct 5, 2025
b893d66
Fix per file ruff ignores related to simplification (#26259)
hmellor Oct 5, 2025
60bc25e
[CI] Add Blackwell LM Eval Small Models test to nightly (#26052)
mgoin Oct 5, 2025
f509a20
[DOC] Update production-stack.md (#26177)
elieserr Oct 5, 2025
d3c8429
[CI] Add comment about the single cudagraph capture size that is used…
tdoublep Oct 6, 2025
778f554
[V1] [Hybrid] Some additional clean-up in Mamba2 prefix caching (#26222)
tdoublep Oct 6, 2025
59b4776
[Doc] Edited minor typo (#26266)
orangeng Oct 6, 2025
4be7d7c
[MISC] Add heheda12345 to CODEOWNERS of vllm/config/cache.py (#26270)
heheda12345 Oct 6, 2025
91ac7f7
[CI][gpt-oss] Enable python tool tests in CI (#24315)
wuhang2014 Oct 6, 2025
6c04638
Fix per file ruff ignores related to line length (#26262)
hmellor Oct 6, 2025
039b6ba
Bump actions/stale from 10.0.0 to 10.1.0 (#26272)
dependabot[bot] Oct 6, 2025
7c2ec0f
[Benchmarking] Add disable_shuffle option for dataset loading (#26258)
ymoslem Oct 6, 2025
43c146c
[Misc] Clean up unnecessary E501 ignore (#26274)
ywang96 Oct 6, 2025
59f30d0
[Docs] Edit HF Inference Endpoints documentation (#26275)
ariG23498 Oct 6, 2025
77c95f7
[Doc] add KAITO to integrations (#25521)
abhisheksheth28 Oct 6, 2025
391612e
[Frontend] Consolidate tokenizer init code (#26276)
DarkLight1337 Oct 6, 2025
19a00eb
[Model] Use `merge_by_field_config` for MM models (Llava family) (#26…
DarkLight1337 Oct 6, 2025
0340f45
Support expert parallel load balancing in Transformers backend (#26287)
hmellor Oct 6, 2025
ab5e7d9
[Bugfix] Fix mrope in Transformers Backend (#26087)
zucchini-nlp Oct 6, 2025
fc67969
Fix `DotsOCR` tensor type (#26281)
what-in-the-nim Oct 6, 2025
cb5c553
Add Eagle3 config support for auxiliary hidden state layer IDs
rahul-tuli Sep 30, 2025
07e7c78
Document Eagle3 auxiliary layer default selection in Llama
rahul-tuli Sep 30, 2025
58dfcf6
Implement SupportsEagle3 interface for Llama4 multimodal models
rahul-tuli Sep 30, 2025
730f04d
Override get_input_embeddings in Eagle3 to process text-only inputs
rahul-tuli Sep 30, 2025
06c6c93
Add dynamic Eagle3 auxiliary layer configuration from speculative config
rahul-tuli Sep 30, 2025
1c1d679
Review comments
rahul-tuli Oct 3, 2025
1037b36
Use get_input_embeddings
rahul-tuli Oct 3, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
Original file line number Diff line number Diff line change
Expand Up @@ -368,7 +368,7 @@ def parse_client_command(cmd: str) -> dict[str, Any]:
# The GPUs sometimes come in format of "GPUTYPE\nGPUTYPE\n...",
# we want to turn it into "8xGPUTYPE"
df["GPU"] = df["GPU"].apply(
lambda x: f"{len(x.split('\n'))}x{x.split('\n')[0]}"
lambda x: f"{len(x.splitlines())}x{x.splitlines()[0]}"
)

# get markdown tables
Expand Down
46 changes: 0 additions & 46 deletions .buildkite/pyproject.toml

This file was deleted.

15 changes: 13 additions & 2 deletions .buildkite/test-pipeline.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -477,6 +477,7 @@ steps:
source_file_dependencies:
- csrc/mamba/
- tests/kernels/mamba
- vllm/model_executor/layers/mamba/ops
commands:
- pytest -v -s kernels/mamba

Expand Down Expand Up @@ -834,11 +835,11 @@ steps:
- pytest -v -s tests/kernels/moe/test_flashinfer.py
- pytest -v -s tests/compile/test_silu_mul_quant_fusion.py

- label: GPT-OSS Eval (Blackwell)
- label: Blackwell GPT-OSS Eval
timeout_in_minutes: 60
working_dir: "/vllm-workspace/"
gpu: b200
optional: true # disable while debugging
optional: true # run on nightlies
source_file_dependencies:
- tests/evals/gpt_oss
- vllm/model_executor/models/gpt_oss.py
Expand All @@ -865,6 +866,16 @@ steps:
commands:
- pytest -s -v tests/quantization/test_blackwell_moe.py

- label: Blackwell LM Eval Small Models
timeout_in_minutes: 75
gpu: b200
optional: true # run on nightlies
source_file_dependencies:
- csrc/
- vllm/model_executor/layers/quantization
commands:
- pytest -s -v evals/gsm8k/test_gsm8k_correctness.py --config-list-file=configs/models-blackwell.txt --tp-size=1

##### 1 GPU test #####
##### multi gpus test #####

Expand Down
1 change: 1 addition & 0 deletions .github/CODEOWNERS
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ CMakeLists.txt @tlrmchlsmth @LucasWilkinson
# Any change to the VllmConfig changes can have a large user-facing impact,
# so spam a lot of people
/vllm/config @simon-mo @WoosukKwon @youkaichao @robertgshaw2-redhat @mgoin @tlrmchlsmth @houseroad @hmellor @yewentao256 @ProExpertProg
/vllm/config/cache.py @simon-mo @WoosukKwon @youkaichao @robertgshaw2-redhat @mgoin @tlrmchlsmth @houseroad @hmellor @yewentao256 @ProExpertProg @heheda12345

# vLLM V1
/vllm/v1 @WoosukKwon @robertgshaw2-redhat @njhill @ywang96 @comaniac @alexm-redhat
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/stale.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ jobs:
actions: write
runs-on: ubuntu-latest
steps:
- uses: actions/stale@3a9db7e6a41a89f618792c92c0e97cc736e1b13f # v10.0.0
- uses: actions/stale@5f858e3efba33a5ca4407a664cc011ad407f2008 # v10.1.0
with:
# Increasing this value ensures that changes to this workflow
# propagate to all issues and PRs in days rather than months
Expand Down
16 changes: 2 additions & 14 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,28 +6,16 @@ default_stages:
- manual # Run in CI
exclude: 'vllm/third_party/.*'
repos:
- repo: https://github.com/google/yapf
rev: v0.43.0
hooks:
- id: yapf
args: [--in-place, --verbose]
# Keep the same list from yapfignore here to avoid yapf failing without any inputs
exclude: '(.buildkite|benchmarks|build|examples)/.*'
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.11.7
rev: v0.13.3
hooks:
- id: ruff
- id: ruff-check
args: [--output-format, github, --fix]
- id: ruff-format
files: ^(.buildkite|benchmarks|examples)/.*
- repo: https://github.com/crate-ci/typos
rev: v1.35.5
hooks:
- id: typos
- repo: https://github.com/PyCQA/isort
rev: 6.0.1
hooks:
- id: isort
- repo: https://github.com/pre-commit/mirrors-clang-format
rev: v20.1.3
hooks:
Expand Down
2 changes: 1 addition & 1 deletion benchmarks/benchmark_block_pool.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@
# SPDX-FileCopyrightText: Copyright contributors to the vLLM project
import gc

from benchmark_utils import TimeCollector
from tabulate import tabulate

from benchmark_utils import TimeCollector
from vllm.utils import FlexibleArgumentParser
from vllm.v1.core.block_pool import BlockPool

Expand Down
4 changes: 2 additions & 2 deletions benchmarks/benchmark_ngram_proposer.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,9 @@
from unittest import mock

import numpy as np
from benchmark_utils import TimeCollector
from tabulate import tabulate

from benchmark_utils import TimeCollector
from vllm.config import (
CacheConfig,
DeviceConfig,
Expand Down Expand Up @@ -164,7 +164,7 @@ def invoke_main() -> None:
)
parser.add_argument(
"--batched", action="store_true", help="consider time to prepare batch"
) # noqa: E501
)
parser.add_argument(
"--num-iteration",
type=int,
Expand Down
9 changes: 4 additions & 5 deletions benchmarks/benchmark_serving_structured_output.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,14 +37,13 @@
import datasets
import numpy as np
import pandas as pd
from tqdm.asyncio import tqdm
from transformers import PreTrainedTokenizerBase

from backend_request_func import (
ASYNC_REQUEST_FUNCS,
RequestFuncInput,
RequestFuncOutput,
)
from tqdm.asyncio import tqdm
from transformers import PreTrainedTokenizerBase

try:
from vllm.transformers_utils.tokenizer import get_tokenizer
Expand Down Expand Up @@ -910,13 +909,13 @@ def create_argument_parser():
parser.add_argument(
"--tokenizer",
type=str,
help="Name or path of the tokenizer, if not using the default tokenizer.", # noqa: E501
help="Name or path of the tokenizer, if not using the default tokenizer.",
)
parser.add_argument(
"--tokenizer-mode",
type=str,
default="auto",
help="Name or path of the tokenizer, if not using the default tokenizer.", # noqa: E501
help="Name or path of the tokenizer, if not using the default tokenizer.",
)
parser.add_argument(
"--num-prompts",
Expand Down
Loading