Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
1003 commits
Select commit Hold shift + click to select a range
e64afa4
multi-node offline DP+EP example (#15484)
youkaichao Mar 26, 2025
0af4d76
Fix weight loading for some models in Transformers backend (#15544)
hmellor Mar 26, 2025
733e7c9
[Refactor] Remove unnecessary backend parameter in structured output …
aarnphm Mar 26, 2025
35fad35
[V1][Sampler] Faster top-k only implementation (#15478)
njhill Mar 26, 2025
27df519
Support SHA256 as hash function in prefix caching (#15297)
dr75 Mar 26, 2025
dd8a29d
Applying some fixes for K8s agents in CI (#15493)
Alexei-V-Ivanov-AMD Mar 26, 2025
b2e85e2
[V1] TPU - Revert to exponential padding by default (#15565)
alexm-redhat Mar 26, 2025
9d119a8
[V1] TPU CI - Fix test_compilation.py (#15570)
alexm-redhat Mar 26, 2025
7a88827
Use Cache Hinting for fused_moe kernel (#15511)
wrmedford Mar 26, 2025
e74ff40
[TPU] support disabling xla compilation cache (#15567)
yaochengji Mar 27, 2025
7a6d45b
Support FIPS enabled machines with MD5 hashing (#15299)
MattTheCuber Mar 27, 2025
9239bf7
[Kernel] CUTLASS grouped gemm fp8 MoE kernel (#13972)
ElizaWszola Mar 27, 2025
ce78f9a
Add automatic tpu label to mergify.yml (#15560)
mgoin Mar 27, 2025
69db16a
add platform check back (#15578)
Chenyaaang Mar 27, 2025
8095341
[misc] LoRA: Remove unused long context test data (#15558)
varun-sundar-rabindranath Mar 27, 2025
7f301dd
[Doc] Update V1 user guide for fp8 kv cache support (#15585)
wayzeng Mar 27, 2025
fb22be5
[moe][quant] add weight name case for offset (#15515)
MengqingCao Mar 27, 2025
54aa619
[V1] Refactor num_computed_tokens logic (#15307)
comaniac Mar 27, 2025
74d70b0
add retries and get rid of progress meter
moulalis Mar 27, 2025
6b55376
Merge branch 'rhoai-2.19' into moulalis-patch-3
moulalis Mar 27, 2025
a1d23a1
Merge pull request #93 from red-hat-data-services/moulalis-patch-3
moulalis Mar 27, 2025
dcf2a59
Allow torchao quantization in SiglipMLP (#15575)
jerryzh168 Mar 27, 2025
ecff830
[ROCm] Env variable to trigger custom PA (#15557)
gshtras Mar 27, 2025
619d3de
[TPU] [V1] fix cases when max_num_reqs is set smaller than MIN_NUM_SE…
yaochengji Mar 27, 2025
df8d3d1
[Misc] Restrict ray version dependency and update PP feature warning …
ruisearch42 Mar 27, 2025
e1e0fd7
[TPU] Avoid Triton Import (#15589)
robertgshaw2-redhat Mar 27, 2025
f4c98b4
[Misc] Consolidate LRUCache implementations (#15481)
Avabowler Mar 27, 2025
43ed414
[Quantization] Fp8 Channelwise Dynamic Per Token GroupedGEMM (#15587)
robertgshaw2-redhat Mar 27, 2025
e6c9053
[Misc] Clean up `scatter_patch_features` (#15559)
DarkLight1337 Mar 27, 2025
3f532cb
[Misc] Use model_redirect to redirect the model name to a local folde…
noooop Mar 27, 2025
6278bc8
Fix incorrect filenames in vllm_compile_cache.py (#15494)
zou3519 Mar 27, 2025
8063dfc
[Doc] update --system for transformers installation in docker doc (#1…
reidliu41 Mar 27, 2025
ac5bc61
[Model] MiniCPM-V/O supports V1 (#15487)
DarkLight1337 Mar 27, 2025
8958217
[Bugfix] Fix use_cascade_attention handling for Alibi-based models on…
h-sugi Mar 27, 2025
eb3737d
Merge with v0.7.2.1_downstream
dtrifiro Mar 27, 2025
9bffc0f
solve conflicts with v0.7.3 (upstream)
dtrifiro Mar 27, 2025
45386e3
Dockerfile.ubi: build vllm instead of installing it from AIPCC wheels
dtrifiro Mar 27, 2025
0b93bec
make pre-commit happy
dtrifiro Mar 27, 2025
25ee230
Merge pull request #97 from dtrifiro/rhoai-2.19-nm-vllm-ent
dtrifiro Mar 27, 2025
0fde7da
Dockerfile.rocm.ubi: also include audio,video,tensorizer extras for l…
dtrifiro Mar 27, 2025
07bf813
[Doc] Link to onboarding tasks (#15629)
DarkLight1337 Mar 27, 2025
2471815
[Misc] Replace `is_encoder_decoder_inputs` with `split_enc_dec_inputs…
DarkLight1337 Mar 27, 2025
66aa4c0
[Feature] Add middleware to log API Server responses (#15593)
terrytangyuan Mar 27, 2025
13ac9ca
[Misc] Avoid direct access of global `mm_registry` in `compute_encode…
DarkLight1337 Mar 27, 2025
46450b8
Use absolute placement for Ask AI button (#15628)
hmellor Mar 27, 2025
4098b72
[Bugfix][TPU][V1] Fix recompilation (#15553)
NickLucche Mar 27, 2025
14a367f
add libsodium arg
ckhordiasma Mar 27, 2025
2f42223
Merge pull request #99 from red-hat-data-services/add-libsodium-version
ckhordiasma Mar 27, 2025
32d6692
Correct PowerPC to modern IBM Power (#15635)
clnperez Mar 27, 2025
112b3e5
[CI] Update rules for applying `tpu` label. (#15634)
russellb Mar 27, 2025
2885136
Dockerfile.ubi: fix chown cmd permissions
dtrifiro Mar 27, 2025
757d8a9
[Bugfix][API Server] Fix invalid usage of 'ge' and 'le' in port valid…
WangErXiao Feb 22, 2025
15dac21
[V1] AsyncLLM data parallel (#13923)
njhill Mar 27, 2025
bd45912
[TPU] Lazy Import (#15656)
robertgshaw2-redhat Mar 28, 2025
726efc6
[Quantization][V1] BitsAndBytes support V1 (#15611)
jeejeelee Mar 28, 2025
4e0f607
[Bugfix] Fix failure to launch in Tensor Parallel TP mode on macOS. (…
kebe7jun Mar 28, 2025
b4245a4
[Doc] Fix dead links in Job Board (#15637)
wwl2755 Mar 28, 2025
8a49eea
[CI][TPU] Temporarily Disable Quant Test on TPU (#15649)
robertgshaw2-redhat Mar 28, 2025
4ae17bf
Revert "Use Cache Hinting for fused_moe kernel (#15511)" (#15645)
wrmedford Mar 28, 2025
e7f720e
[Misc]add coding benchmark for speculative decoding (#15303)
CXIAAAAA Mar 28, 2025
4d0ec37
[Quantization][FP8] Adding support for fp8 gemm layer input in fp8 (#…
gshtras Mar 28, 2025
cec8c7d
Refactor error handling for multiple exceptions in preprocessing (#15…
JasonZhu1313 Mar 28, 2025
8693e47
[Bugfix] Fix `mm_hashes` forgetting to be passed (#15668)
DarkLight1337 Mar 28, 2025
355f663
[V1] Remove legacy input registry (#15673)
DarkLight1337 Mar 28, 2025
2d9045f
[TPU][CI] Fix TPUModelRunner Test (#15667)
robertgshaw2-redhat Mar 28, 2025
32b14ba
[Refactor][Frontend] Keep all logic about reasoning into one class (#…
gaocegege Mar 28, 2025
280d074
[CPU][CI] Improve CPU Dockerfile (#15690)
bigPYJ1151 Mar 28, 2025
70f2c2a
[Bugfix] Fix 'InductorAdaptor object has no attribute 'cache_dir' (#1…
jeejeelee Mar 28, 2025
a10314c
[Misc] Fix test_sleep to use query parameters (#14373)
lizzzcai Mar 28, 2025
3bbaacb
[Bugfix][Frontend] Eliminate regex based check in reasoning full gene…
gaocegege Mar 28, 2025
fd5fd26
[Frontend] update priority for --api-key and VLLM_API_KEY (#15588)
reidliu41 Mar 28, 2025
0b41675
[Docs] Add "Generation quality changed" section to troubleshooting (#…
hmellor Mar 28, 2025
91276c5
[Model] Adding torch compile annotations to chatglm (#15624)
jeejeelee Mar 28, 2025
3b00ff9
[Bugfix][v1] xgrammar structured output supports Enum. (#15594)
chaunceyjiang Mar 28, 2025
541d1df
[Bugfix] `embed_is_patch` for Idefics3 (#15696)
DarkLight1337 Mar 28, 2025
7329ff5
[V1] Support disable_any_whtespace for guidance backend (#15584)
russellb Mar 28, 2025
2914006
[doc] add missing imports (#15699)
reidliu41 Mar 28, 2025
432cf22
[Bugfix] Fix regex compile display format (#15368)
kebe7jun Mar 28, 2025
47e9038
Fix cpu offload testing for gptq/awq/ct (#15648)
mgoin Mar 28, 2025
70e1322
[Minor] Remove TGI launching script (#15646)
WoosukKwon Mar 28, 2025
c6bc003
[Misc] Remove unused utils and clean up imports (#15708)
DarkLight1337 Mar 28, 2025
d03308b
[Misc] Remove stale func in KVTransferConfig (#14746)
ShangmingCai Mar 28, 2025
038bede
[TPU] [Perf] Improve Memory Usage Estimation (#15671)
robertgshaw2-redhat Mar 28, 2025
04437e3
[Bugfix] [torch.compile] Add Dynamo metrics context during compilatio…
ProExpertProg Mar 28, 2025
c3f687a
[V1] TPU - Fix the chunked prompt bug (#15713)
alexm-redhat Mar 28, 2025
26df46e
[Misc] cli auto show default value (#15582)
reidliu41 Mar 28, 2025
f3f8d8f
implement prometheus fast-api-instrumentor for http service metrics (…
daniel-salib Mar 29, 2025
cff8991
[Docs][V1] Optimize diagrams in prefix caching design (#15716)
simpx Mar 29, 2025
c802f54
[ROCm][AMD][Build] Update AMD supported arch list (#15632)
gshtras Mar 29, 2025
de1cb38
[Model] Support Skywork-R1V (#15397)
pengyuange Mar 29, 2025
762b424
[Docs] Document v0 engine support in reasoning outputs (#15739)
gaocegege Mar 29, 2025
6d531ad
[Misc][V1] Misc code streamlining (#15723)
njhill Mar 29, 2025
1286211
[Bugfix] LoRA V1: add and fix entrypoints tests (#15715)
varun-sundar-rabindranath Mar 29, 2025
7a79920
[CI] Speed up V1 structured output tests (#15718)
russellb Mar 29, 2025
8427f70
Use numba 0.61 for python 3.10+ to support numpy>=2 (#15692)
cyyever Mar 29, 2025
5b800f0
[Bugfix] set VLLM_WORKER_MULTIPROC_METHOD=spawn for vllm.entrypoionts…
jinzhen-lin Mar 29, 2025
da461f3
[TPU][V1][Bugfix] Fix w8a8 recompiilation with GSM8K (#15714)
NickLucche Mar 29, 2025
7c1f760
[Kernel][TPU][ragged-paged-attn] vLLM code change for PR#8896 (#15659)
yarongmu-google Mar 29, 2025
73aa704
[doc] update doc (#15740)
reidliu41 Mar 29, 2025
4965ec4
[FEAT] [ROCm] Add AITER int8 scaled gemm kernel (#15433)
tjtanaa Mar 29, 2025
94744ba
[V1] [Feature] Collective RPC (#15444)
wwl2755 Mar 29, 2025
6fa7cd3
[Feature][Disaggregated] Support XpYd disaggregated prefill with Moon…
ShangmingCai Mar 29, 2025
c67abd6
[V1] Support interleaved modality items (#15605)
ywang96 Mar 29, 2025
2bc4be4
[V1][Minor] Simplify rejection sampler's parse_output (#15741)
WoosukKwon Mar 29, 2025
3c0ff91
[Bugfix] Fix Mllama interleaved images input support (#15564)
Isotr0py Mar 29, 2025
0455337
[CI] xgrammar structured output supports Enum. (#15757)
chaunceyjiang Mar 30, 2025
6909a76
[Bugfix] Fix Mistral guided generation using xgrammar (#15704)
juliendenize Mar 30, 2025
44c3a5a
[doc] update conda to usage link in installation (#15761)
reidliu41 Mar 30, 2025
7fd8c0f
fix test_phi3v (#15321)
pansicheng Mar 30, 2025
803d5c3
[V1] Override `mm_counts` for dummy data creation (#15703)
DarkLight1337 Mar 30, 2025
248e76c
fix: lint fix a ruff checkout syntax error (#15767)
yihong0618 Mar 30, 2025
bb103b2
[Bugfix] Added `embed_is_patch` mask for fuyu model (#15731)
kylehh Mar 30, 2025
f44d253
Merge remote-tracking branch 'upstream/main'
Mar 30, 2025
70fedd0
fix: Comments to English for better dev experience (#15768)
yihong0618 Mar 30, 2025
9b459ec
[V1][Scheduler] Avoid calling `_try_schedule_encoder_inputs` for ever…
WoosukKwon Mar 30, 2025
18ed313
[Misc] update the comments (#15780)
lcy4869 Mar 31, 2025
effc5d2
[Benchmark] Update Vision Arena Dataset and HuggingFaceDataset Setup …
JenZhao Mar 31, 2025
e858294
[Feature][ROCm]Enable fusion pass for torch.compile on ROCm (#15050)
charlifu Mar 31, 2025
b932c04
Recommend developing with Python 3.12 in developer guide (#15811)
hmellor Mar 31, 2025
e7ae3bf
fix: better install requirement for install in setup.py (#15796)
yihong0618 Mar 31, 2025
555aa21
[V1] Fully Transparent Implementation of CPU Offloading (#15354)
youkaichao Mar 31, 2025
3aa2b6a
[Model] Update support for NemotronNAS models (#15008)
Naveassaf Mar 31, 2025
c2e7507
[Bugfix] Fix Crashing When Loading Modules With Batchnorm Stats (#15813)
alex-jw-brooks Mar 31, 2025
037bcd9
[Bugfix] Fix missing return value in load_weights method of adapters.…
noc-turne Mar 31, 2025
e5ef4fa
Upgrade `transformers` to `v4.50.3` (#13905)
hmellor Mar 31, 2025
09e974d
[Bugfix] Check dimensions of multimodal embeddings in V1 (#15816)
DarkLight1337 Mar 31, 2025
239b7be
[V1][Spec Decode] Remove deprecated spec decode config params (#15466)
ShangmingCai Mar 31, 2025
2de4118
fix: change GB to GiB in logging close #14979 (#15807)
yihong0618 Mar 31, 2025
9a2160f
[V1] TPU CI - Add basic perf regression test (#15414)
alexm-redhat Mar 31, 2025
d4bfc23
Fix Transformers backend compatibility check (#15290)
hmellor Mar 31, 2025
f98a492
[V1][Core] Remove unused speculative config from scheduler (#15818)
markmc Mar 31, 2025
e6e3c55
Move dockerfiles into their own directory (#14549)
hmellor Mar 31, 2025
b7b7676
[Distributed] Add custom allreduce support for ROCM (#14125)
ilmarkov Apr 1, 2025
a76f547
Rename fallback model and refactor supported models section (#15829)
hmellor Apr 1, 2025
a164aea
[Frontend] Add Phi-4-mini function calling support (#14886)
kinfey Apr 1, 2025
ff64739
[Bugfix][Model] fix mllama multi-image (#14883)
yma11 Apr 1, 2025
e830b01
[Bugfix] Fix extra comma (#15851)
haochengxia Apr 1, 2025
63d8eab
[Bugfix]: Fix is_embedding_layer condition in VocabParallelEmbedding …
alexwl Apr 1, 2025
7e4e709
[V1] TPU - Fix fused MOE (#15834)
alexm-redhat Apr 1, 2025
4a9ce17
[sleep mode] clear pytorch cache after sleep (#15248)
lionelvillard Apr 1, 2025
c7e63aa
[ROCm] Use device name in the warning (#15838)
gshtras Apr 1, 2025
3a5f0af
[V1] Implement sliding window attention in kv_cache_manager (#14097)
heheda12345 Apr 1, 2025
8af5a5c
fix: can not use uv run collect_env close #13888 (#15792)
yihong0618 Apr 1, 2025
30d6a01
[Feature] specify model in config.yaml (#15798)
wayzeng Apr 1, 2025
79455cf
[Misc] Enable V1 LoRA by default (#15320)
varun-sundar-rabindranath Apr 1, 2025
656fd72
[Misc] Fix speculative config repr string (#15860)
ShangmingCai Apr 1, 2025
d330558
[Docs] Fix small error in link text (#15868)
hmellor Apr 1, 2025
0a298ea
[Bugfix] Fix no video/image profiling edge case for `MultiModalDataPa…
Isotr0py Apr 1, 2025
8dd41d6
[Misc] Use envs.VLLM_USE_RAY_COMPILED_DAG_CHANNEL_TYPE (#15831)
ruisearch42 Apr 1, 2025
f3aca1e
setup correct nvcc version with CUDA_HOME (#15725)
chenyang78 Apr 1, 2025
51d7c6a
[Model] Support Mistral3 in the HF Transformers format (#15505)
mgoin Apr 1, 2025
2e45bd2
[Misc] remove unused script (#15746)
reidliu41 Apr 1, 2025
2b93162
Remove `format.sh` as it's been unsupported >70 days (#15884)
hmellor Apr 1, 2025
085cbc4
[New Model]: jinaai/jina-reranker-v2-base-multilingual (#15876)
noooop Apr 1, 2025
2041c0e
[Doc] Quark quantization documentation (#15861)
Apr 1, 2025
b63bd14
Reinstate `format.sh` and make `pre-commit` installation simpler (#15…
hmellor Apr 1, 2025
4e5a0f6
[Misc] Allow using OpenCV as video IO fallback (#15055)
Isotr0py Apr 1, 2025
a57a304
[ROCm][Build][Bugfix] Bring the base dockerfile in sync with the ROCm…
gshtras Apr 1, 2025
e59ca94
Add option to use DeepGemm contiguous grouped gemm kernel for fused M…
bnellnm Apr 1, 2025
dfa82e2
[CI/Build] Clean up LoRA tests (#15867)
jeejeelee Apr 1, 2025
38327cf
[Model] Aya Vision (#15441)
JenZhao Apr 1, 2025
9ec8257
[Model] Add module name prefixes to gemma3 (#15889)
cloud11665 Apr 1, 2025
7e3f7a4
[CI] Disable flaky structure decoding test temporarily. (#15892)
ywang96 Apr 1, 2025
a79cc68
[V1][Metrics] Initial speculative decoding metrics (#15151)
markmc Apr 1, 2025
e75a630
[V1][Spec Decode] Implement Eagle Proposer [1/N] (#15729)
WoosukKwon Apr 1, 2025
7acd539
[Docs] update usage stats language (#15898)
simon-mo Apr 1, 2025
93491ae
[BugFix] make sure socket close (#15875)
yihong0618 Apr 1, 2025
9ef98d5
[Model][MiniMaxText01] Support MiniMaxText01 model inference (#13454)
ZZBoom Apr 1, 2025
db9dfcf
[Docs] Add Ollama meetup slides (#15905)
simon-mo Apr 1, 2025
58f5a59
[Docs] Add Intel as Sponsor (#15913)
simon-mo Apr 2, 2025
24b7fb4
[Spec Decode] Fix input triton kernel for eagle (#15909)
ekagra-ranjan Apr 2, 2025
6efb195
[V1] Fix: make sure `k_index` is int64 for `apply_top_k_only` (#15907)
b8zhong Apr 2, 2025
2039c63
[Bugfix] Fix imports for MoE on CPU (#15841)
gau-nernst Apr 2, 2025
274d8e8
[V1][Minor] Enhance SpecDecoding Metrics Log in V1 (#15902)
WoosukKwon Apr 2, 2025
c920e01
[Doc] Update rocm.inc.md (#15917)
chun37 Apr 2, 2025
0e00d40
[V1][Bugfix] Fix typo in MoE TPU checking (#15927)
ywang96 Apr 2, 2025
aa557e6
[Benchmark]Fix error message (#15866)
Potabk Apr 2, 2025
cdb5701
[Misc] Replace print with logger (#15923)
chaunceyjiang Apr 2, 2025
4203926
[CI/Build] Further clean up LoRA tests (#15920)
jeejeelee Apr 2, 2025
2edc87b
[Bugfix] Fix cache block size calculation for CPU MLA (#15848)
gau-nernst Apr 2, 2025
101f148
[Build/CI] Update lm-eval to 0.4.8 (#15912)
cthi Apr 2, 2025
90969fb
[Kernel] Add more dtype support for GGUF dequantization (#15879)
LukasBluebaum Apr 2, 2025
ddb94c2
[core] Add tags parameter to wake_up() (#15500)
erictang000 Apr 2, 2025
14e53ed
[V1] Fix json_object support with xgrammar (#15488)
russellb Apr 2, 2025
51826d5
Add minimum version for `huggingface_hub` to enable Xet downloads (#1…
hmellor Apr 2, 2025
2529378
[Bugfix][Benchmarks] Ensure `async_request_deepspeed_mii` uses the Op…
b8zhong Apr 2, 2025
44f9905
[CI] Remove duplicate entrypoints-test (#15940)
yankay Apr 2, 2025
594a8b9
[Bugfix] Fix the issue where the model name is empty string, causing …
chaunceyjiang Apr 2, 2025
98d7367
[Metrics] Hide deprecated metrics (#15458)
markmc Apr 2, 2025
cefb9e5
[Frontend] Implement Tool Calling with `tool_choice='required'` (#13483)
meffmadd Apr 2, 2025
550b280
[CPU][Bugfix] Using custom allreduce for CPU backend (#15934)
bigPYJ1151 Apr 2, 2025
e86c414
[Model] use AutoWeightsLoader in model load_weights (#15770)
lengrongfu Apr 2, 2025
58e234a
[Misc] V1 LoRA support CPU offload (#15843)
jeejeelee Apr 2, 2025
8bd651b
Restricted cmake to be less than version 4 as 4.x breaks the build of…
npanpaliya Apr 2, 2025
1cab43c
[misc] instruct pytorch to use nvml-based cuda check (#15951)
youkaichao Apr 2, 2025
f021b97
[V1] Support Mistral3 in V1 (#15950)
mgoin Apr 2, 2025
55acf86
Fix `huggingface-cli[hf-xet]` -> `huggingface-cli[hf_xet]` (#15969)
hmellor Apr 2, 2025
1b84eff
[V1][TPU] TPU-optimized top-p implementation (avoids scattering). (#1…
hyeygit Apr 3, 2025
01b6113
[TPU] optimize the all-reduce performance (#15903)
yaochengji Apr 3, 2025
bd7599d
[V1][TPU] Do not compile sampling more than needed (#15883)
NickLucche Apr 3, 2025
e73ff24
[ROCM][KERNEL] Paged attention for V1 (#15720)
maleksan85 Apr 3, 2025
37bfee9
fix: better error message for get_config close #13889 (#15943)
yihong0618 Apr 3, 2025
8b66470
[bugfix] add seed in torchrun_example.py (#15980)
youkaichao Apr 3, 2025
57a810d
[ROCM][V0] PA kennel selection when no sliding window provided (#15982)
maleksan85 Apr 3, 2025
06f21ce
[Benchmark] Add AIMO Dataset to Benchmark (#15955)
StevenShi-23 Apr 3, 2025
5e125e7
[misc] improve error message for "Failed to infer device type" (#15994)
youkaichao Apr 3, 2025
463bbb1
[Bugfix][V1] Fix bug from putting llm_engine.model_executor in a back…
wwl2755 Apr 3, 2025
a43aa18
[doc] update contribution link (#15922)
reidliu41 Apr 3, 2025
84884cd
fix: tiny fix make format.sh excutable (#16015)
yihong0618 Apr 3, 2025
421c462
[SupportsQuant] Bert, Blip, Blip2, Bloom (#15573)
kylesayrs Apr 3, 2025
82e7e19
[SupportsQuant] Chameleon, Chatglm, Commandr (#15952)
kylesayrs Apr 3, 2025
849c32e
upstreamsync-20250330 (#203)
andy-neuma Apr 3, 2025
d2b58ca
[Neuron][kernel] Fuse kv cache into a single tensor (#15911)
liangfu Apr 3, 2025
15ba07e
[Minor] Fused experts refactor (#15914)
bnellnm Apr 3, 2025
45b1ff7
[Misc][Performance] Advance tpu.txt to the most recent nightly torch …
yarongmu-google Apr 3, 2025
03a70ea
Re-enable the AMD Testing for the passing tests. (#15586)
Alexei-V-Ivanov-AMD Apr 3, 2025
b6be6f8
[TPU] Support sliding window and logit soft capping in the paged atte…
vanbasten23 Apr 3, 2025
f15e70d
[TPU] Switch Test to Non-Sliding Window (#15981)
robertgshaw2-redhat Apr 3, 2025
dcc56d6
[Bugfix] Fix function names in test_block_fp8.py (#16033)
bnellnm Apr 3, 2025
092475f
[ROCm] Tweak the benchmark script to run on ROCm (#14252)
huydhn Apr 4, 2025
86cbd2e
[Misc] improve gguf check (#15974)
reidliu41 Apr 4, 2025
fadc59c
[TPU][V1] Remove ragged attention kernel parameter hard coding (#16041)
yaochengji Apr 4, 2025
4ef0bb1
doc: add info for macos clang errors (#16049)
yihong0618 Apr 4, 2025
a35a8a8
[V1][Spec Decode] Avoid logging useless nan metrics (#16023)
markmc Apr 4, 2025
bf7e3c5
[Model] use AutoWeightsLoader for baichuan, gpt-neox, mpt (#15939)
jonghyunchoe Apr 4, 2025
0812d8d
[Hardware][Gaudi][BugFix] fix arguments of hpu fused moe (#15945)
zhenwei-intel Apr 4, 2025
230b131
[Bugfix][kernels] Fix half2float conversion in gguf kernels (#15995)
Isotr0py Apr 4, 2025
95862f7
[Benchmark][Doc] Update throughput benchmark and README (#15998)
StevenShi-23 Apr 4, 2025
2386803
[CPU] Change default block_size for CPU backend (#16002)
bigPYJ1151 Apr 4, 2025
ef608c3
[Distributed] [ROCM] Fix custom allreduce enable checks (#16010)
ilmarkov Apr 4, 2025
40a36cc
[ROCm][Bugfix] Use platform specific FP8 dtype (#15717)
gshtras Apr 4, 2025
a6d042d
[ROCm][Bugfix] Bring back fallback to eager mode removed in #14917, b…
gshtras Apr 4, 2025
4708f13
[Bugfix] Fix default behavior/fallback for pp in v1 (#16057)
mgoin Apr 4, 2025
4dc52e1
[CI] Reorganize .buildkite directory (#16001)
khluu Apr 4, 2025
5e96277
docker-bake: cleanup variable definitions (#204)
dtrifiro Apr 4, 2025
651cf0f
[V1] DP scale-out (1/N): Use zmq ROUTER/DEALER sockets for input queu…
njhill Apr 4, 2025
f5722a5
[V1] Scatter and gather placeholders in the model runner (#15712)
DarkLight1337 Apr 4, 2025
af51d80
Revert "[V1] Scatter and gather placeholders in the model runner" (#1…
ywang96 Apr 4, 2025
d6fc629
[Kernel][Minor] Re-fuse triton moe weight application (#16071)
bnellnm Apr 4, 2025
70ad3f9
[Bugfix][TPU] Fix V1 TPU worker for sliding window (#16059)
mgoin Apr 4, 2025
63375f0
[V1][Spec Decode] Update N-gram Proposer Interface (#15750)
WoosukKwon Apr 4, 2025
c575232
[Model] Support Llama4 in vLLM (#16104)
houseroad Apr 6, 2025
296c657
Revert "[V1] DP scale-out (1/N): Use zmq ROUTER/DEALER sockets for in…
simon-mo Apr 6, 2025
fec228f
docker-bake: bump flashinfer to 0.2.1.post2+cu124torch2.6
dtrifiro Apr 5, 2025
7fed335
Sync with upsteam @ v0.8.3 (296c6572d)
dtrifiro Apr 6, 2025
6bd4d08
Sync with neuralmagic/nm-vllm-ent @ v0.8.3.0-rc (f08840f14)
dtrifiro Apr 7, 2025
ec82cdb
Dockerfile*.ubi: fix requirements files path
dtrifiro Apr 7, 2025
a06f8ac
remove duplicate numba dependency
dtrifiro Apr 7, 2025
a133489
Dockerfile*.ubi: bump vllm-tgis-adapter to 0.7.0
dtrifiro Apr 7, 2025
09cbae3
Merge branch 'rhoai-2.20' into rhoai-2.19-sync-with-midstream-0.8.3.0
dtrifiro Apr 17, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
4 changes: 2 additions & 2 deletions .buildkite/lm-eval-harness/configs/Minitron-4B-Base-FP8.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,8 @@ tasks:
- name: "gsm8k"
metrics:
- name: "exact_match,strict-match"
value: 0.233
value: 0.231
- name: "exact_match,flexible-extract"
value: 0.236
value: 0.22
limit: 1000
num_fewshot: 5
5 changes: 5 additions & 0 deletions .buildkite/lm-eval-harness/test_lm_eval_correctness.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@

import lm_eval
import numpy
import pytest
import yaml

RTOL = 0.05
Expand Down Expand Up @@ -46,6 +47,10 @@ def test_lm_eval_correctness():
eval_config = yaml.safe_load(
Path(TEST_DATA_FILE).read_text(encoding="utf-8"))

if eval_config[
"model_name"] == "nm-testing/Meta-Llama-3-70B-Instruct-FBGEMM-nonuniform": #noqa: E501
pytest.skip("FBGEMM is currently failing on main.")

# Launch eval requests.
results = launch_lm_eval(eval_config)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -84,8 +84,13 @@ def results_to_json(latency, throughput, serving):
# this result is generated via `benchmark_serving.py`

# attach the benchmarking command to raw_result
with open(test_file.with_suffix(".commands")) as f:
command = json.loads(f.read())
try:
with open(test_file.with_suffix(".commands")) as f:
command = json.loads(f.read())
except OSError as e:
print(e)
continue

raw_result.update(command)

# update the test name of this result
Expand All @@ -99,8 +104,13 @@ def results_to_json(latency, throughput, serving):
# this result is generated via `benchmark_latency.py`

# attach the benchmarking command to raw_result
with open(test_file.with_suffix(".commands")) as f:
command = json.loads(f.read())
try:
with open(test_file.with_suffix(".commands")) as f:
command = json.loads(f.read())
except OSError as e:
print(e)
continue

raw_result.update(command)

# update the test name of this result
Expand All @@ -121,8 +131,13 @@ def results_to_json(latency, throughput, serving):
# this result is generated via `benchmark_throughput.py`

# attach the benchmarking command to raw_result
with open(test_file.with_suffix(".commands")) as f:
command = json.loads(f.read())
try:
with open(test_file.with_suffix(".commands")) as f:
command = json.loads(f.read())
except OSError as e:
print(e)
continue

raw_result.update(command)

# update the test name of this result
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -426,7 +426,7 @@ main() {

pip install -U transformers

pip install -r requirements-dev.txt
pip install -r requirements/dev.txt
which genai-perf

# check storage
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,15 +10,24 @@ set -x
set -o pipefail

check_gpus() {
# check the number of GPUs and GPU type.
declare -g gpu_count=$(nvidia-smi --list-gpus | wc -l)
if command -v nvidia-smi; then
# check the number of GPUs and GPU type.
declare -g gpu_count=$(nvidia-smi --list-gpus | wc -l)
elif command -v amd-smi; then
declare -g gpu_count=$(amd-smi list | grep 'GPU' | wc -l)
fi

if [[ $gpu_count -gt 0 ]]; then
echo "GPU found."
else
echo "Need at least 1 GPU to run benchmarking."
exit 1
fi
declare -g gpu_type=$(nvidia-smi --query-gpu=name --format=csv,noheader | awk '{print $2}')
if command -v nvidia-smi; then
declare -g gpu_type=$(nvidia-smi --query-gpu=name --format=csv,noheader | awk '{print $2}')
elif command -v amd-smi; then
declare -g gpu_type=$(amd-smi static -g 0 -a | grep 'MARKET_NAME' | awk '{print $2}')
fi
echo "GPU type is $gpu_type"
}

Expand Down Expand Up @@ -90,9 +99,15 @@ kill_gpu_processes() {


# wait until GPU memory usage smaller than 1GB
while [ "$(nvidia-smi --query-gpu=memory.used --format=csv,noheader,nounits | head -n 1)" -ge 1000 ]; do
sleep 1
done
if command -v nvidia-smi; then
while [ "$(nvidia-smi --query-gpu=memory.used --format=csv,noheader,nounits | head -n 1)" -ge 1000 ]; do
sleep 1
done
elif command -v amd-smi; then
while [ "$(amd-smi metric -g 0 | grep 'USED_VRAM' | awk '{print $2}')" -ge 1000 ]; do
sleep 1
done
fi

# remove vllm config file
rm -rf ~/.config/vllm
Expand Down Expand Up @@ -309,11 +324,14 @@ run_serving_tests() {

new_test_name=$test_name"_qps_"$qps

# pass the tensor parallel size to the client so that it can be displayed
# on the benchmark dashboard
client_command="python3 benchmark_serving.py \
--save-result \
--result-dir $RESULTS_FOLDER \
--result-filename ${new_test_name}.json \
--request-rate $qps \
--metadata "tensor_parallel_size=$tp" \
$client_args"

echo "Running test case $test_name with qps $qps"
Expand Down Expand Up @@ -358,7 +376,7 @@ main() {
# get the current IP address, required by benchmark_serving.py
export VLLM_HOST_IP=$(hostname -I | awk '{print $1}')
# turn of the reporting of the status of each request, to clean up the terminal output
export VLLM_LOG_LEVEL="WARNING"
export VLLM_LOGGING_LEVEL="WARNING"

# prepare for benchmarking
cd benchmarks || exit 1
Expand Down
10 changes: 6 additions & 4 deletions .buildkite/nightly-benchmarks/tests/serving-tests.json
Original file line number Diff line number Diff line change
Expand Up @@ -63,10 +63,12 @@
"model": "meta-llama/Meta-Llama-3.1-70B-Instruct",
"disable_log_requests": "",
"tensor_parallel_size": 4,
"swap_space": 16,
"speculative_model": "turboderp/Qwama-0.5B-Instruct",
"num_speculative_tokens": 4,
"speculative_draft_tensor_parallel_size": 1
"swap_space": 16,
"speculative_config": {
"model": "turboderp/Qwama-0.5B-Instruct",
"num_speculative_tokens": 4,
"draft_tensor_parallel_size": 1
}
},
"client_parameters": {
"model": "meta-llama/Meta-Llama-3.1-70B-Instruct",
Expand Down
2 changes: 1 addition & 1 deletion .buildkite/nightly-benchmarks/tests/throughput-tests.json
Original file line number Diff line number Diff line change
Expand Up @@ -32,4 +32,4 @@
"backend": "vllm"
}
}
]
]
25 changes: 18 additions & 7 deletions .buildkite/release-pipeline.yaml
Original file line number Diff line number Diff line change
@@ -1,12 +1,23 @@
steps:
- label: "Build wheel - CUDA 12.4"
agents:
queue: cpu_queue_postmerge
commands:
- "DOCKER_BUILDKIT=1 docker build --build-arg max_jobs=16 --build-arg USE_SCCACHE=1 --build-arg GIT_REPO_CHECK=1 --build-arg CUDA_VERSION=12.4.0 --tag vllm-ci:build-image --target build --progress plain -f docker/Dockerfile ."
- "mkdir artifacts"
- "docker run --rm -v $(pwd)/artifacts:/artifacts_host vllm-ci:build-image bash -c 'cp -r dist /artifacts_host && chmod -R a+rw /artifacts_host'"
- "bash .buildkite/scripts/upload-wheels.sh"
env:
DOCKER_BUILDKIT: "1"

- label: "Build wheel - CUDA 12.1"
agents:
queue: cpu_queue_postmerge
commands:
- "DOCKER_BUILDKIT=1 docker build --build-arg max_jobs=16 --build-arg USE_SCCACHE=1 --build-arg GIT_REPO_CHECK=1 --build-arg CUDA_VERSION=12.1.0 --tag vllm-ci:build-image --target build --progress plain ."
- "DOCKER_BUILDKIT=1 docker build --build-arg max_jobs=16 --build-arg USE_SCCACHE=1 --build-arg GIT_REPO_CHECK=1 --build-arg CUDA_VERSION=12.1.0 --tag vllm-ci:build-image --target build --progress plain -f docker/Dockerfile ."
- "mkdir artifacts"
- "docker run --rm -v $(pwd)/artifacts:/artifacts_host vllm-ci:build-image bash -c 'cp -r dist /artifacts_host && chmod -R a+rw /artifacts_host'"
- "bash .buildkite/upload-wheels.sh"
- "bash .buildkite/scripts/upload-wheels.sh"
env:
DOCKER_BUILDKIT: "1"

Expand All @@ -20,10 +31,10 @@ steps:
agents:
queue: cpu_queue_postmerge
commands:
- "DOCKER_BUILDKIT=1 docker build --build-arg max_jobs=16 --build-arg USE_SCCACHE=1 --build-arg GIT_REPO_CHECK=1 --build-arg CUDA_VERSION=11.8.0 --tag vllm-ci:build-image --target build --progress plain ."
- "DOCKER_BUILDKIT=1 docker build --build-arg max_jobs=16 --build-arg USE_SCCACHE=1 --build-arg GIT_REPO_CHECK=1 --build-arg CUDA_VERSION=11.8.0 --tag vllm-ci:build-image --target build --progress plain -f docker/Dockerfile ."
- "mkdir artifacts"
- "docker run --rm -v $(pwd)/artifacts:/artifacts_host vllm-ci:build-image bash -c 'cp -r dist /artifacts_host && chmod -R a+rw /artifacts_host'"
- "bash .buildkite/upload-wheels.sh"
- "bash .buildkite/scripts/upload-wheels.sh"
env:
DOCKER_BUILDKIT: "1"

Expand All @@ -37,7 +48,7 @@ steps:
queue: cpu_queue_postmerge
commands:
- "aws ecr-public get-login-password --region us-east-1 | docker login --username AWS --password-stdin public.ecr.aws/q9t5s3a7"
- "DOCKER_BUILDKIT=1 docker build --build-arg max_jobs=16 --build-arg USE_SCCACHE=1 --build-arg GIT_REPO_CHECK=1 --build-arg CUDA_VERSION=12.1.0 --tag public.ecr.aws/q9t5s3a7/vllm-release-repo:$BUILDKITE_COMMIT --target vllm-openai --progress plain ."
- "DOCKER_BUILDKIT=1 docker build --build-arg max_jobs=16 --build-arg USE_SCCACHE=1 --build-arg GIT_REPO_CHECK=1 --build-arg CUDA_VERSION=12.4.0 --tag public.ecr.aws/q9t5s3a7/vllm-release-repo:$BUILDKITE_COMMIT --target vllm-openai --progress plain -f docker/Dockerfile ."
- "docker push public.ecr.aws/q9t5s3a7/vllm-release-repo:$BUILDKITE_COMMIT"

- label: "Build and publish TPU release image"
Expand All @@ -46,7 +57,7 @@ steps:
agents:
queue: tpu_queue_postmerge
commands:
- "DOCKER_BUILDKIT=1 docker build --build-arg max_jobs=16 --build-arg USE_SCCACHE=1 --build-arg GIT_REPO_CHECK=1 --tag vllm/vllm-tpu:nightly --tag vllm/vllm-tpu:$BUILDKITE_COMMIT --progress plain -f Dockerfile.tpu ."
- "DOCKER_BUILDKIT=1 docker build --build-arg max_jobs=16 --build-arg USE_SCCACHE=1 --build-arg GIT_REPO_CHECK=1 --tag vllm/vllm-tpu:nightly --tag vllm/vllm-tpu:$BUILDKITE_COMMIT --progress plain -f docker/Dockerfile.tpu ."
- "docker push vllm/vllm-tpu:nightly"
- "docker push vllm/vllm-tpu:$BUILDKITE_COMMIT"
plugins:
Expand All @@ -71,7 +82,7 @@ steps:
queue: cpu_queue_postmerge
commands:
- "aws ecr-public get-login-password --region us-east-1 | docker login --username AWS --password-stdin public.ecr.aws/q9t5s3a7"
- "DOCKER_BUILDKIT=1 docker build --build-arg max_jobs=16 --build-arg GIT_REPO_CHECK=1 --tag public.ecr.aws/q9t5s3a7/vllm-cpu-release-repo:$(buildkite-agent meta-data get release-version) --progress plain -f Dockerfile.cpu ."
- "DOCKER_BUILDKIT=1 docker build --build-arg max_jobs=16 --build-arg GIT_REPO_CHECK=1 --tag public.ecr.aws/q9t5s3a7/vllm-cpu-release-repo:$(buildkite-agent meta-data get release-version) --tag public.ecr.aws/q9t5s3a7/vllm-cpu-release-repo:latest --progress plain --target vllm-openai -f docker/Dockerfile.cpu ."
- "docker push public.ecr.aws/q9t5s3a7/vllm-cpu-release-repo:$(buildkite-agent meta-data get release-version)"
env:
DOCKER_BUILDKIT: "1"
16 changes: 0 additions & 16 deletions .buildkite/run-openvino-test.sh

This file was deleted.

26 changes: 0 additions & 26 deletions .buildkite/run-tpu-test.sh

This file was deleted.

19 changes: 0 additions & 19 deletions .buildkite/run-xpu-test.sh

This file was deleted.

Loading