Skip to content

Commit 852b1e8

Browse files
andy-neumatristanleclercqkhluuyihong0618reidliu41
authored
Sync upstream v0.8.4 (opendatahub-io#212)
SUMMARY: sync to upstream `v0.8.4` and cherry-pick of `7eb42556281d30436a3a988f2c9184ec63c59338`. the cherry-pick is @LucasWilkinson 's llama4 patch. GIT LOG: ```bash commit b197179 (HEAD -> sync-upstream-v0.8.4, origin/sync-upstream-v0.8.4) Author: Lucas Wilkinson <[email protected]> Date: Fri Apr 18 01:13:29 2025 -0400 [BugFix] Accuracy fix for llama4 int4 - improperly casted scales (vllm-project#16801) Signed-off-by: Lucas Wilkinson <[email protected]> commit 60267cc Author: andy-neuma <[email protected]> Date: Mon Apr 21 14:57:52 2025 -0400 remove duplicate entries commit 9d18b50 Merge: db0e117 dc1b4a6 Author: andy-neuma <[email protected]> Date: Mon Apr 21 14:50:01 2025 -0400 Merge remote-tracking branch 'upstream/v0.8.4' into sync-upstream-v0.8.4 commit db0e117 Author: andy-neuma <[email protected]> Date: Mon Apr 21 14:35:23 2025 -0400 Revert "Revert "[V1] DP scale-out (1/N): Use zmq ROUTER/DEALER sockets for input queue (vllm-project#15906)"" This reverts commit 296c657. commit dc1b4a6 (tag: v0.8.4, upstream/v0.8.4) Author: Russell Bryant <[email protected]> Date: Sun Apr 13 22:13:38 2025 -0400 [Core][V0] Enable regex support with xgrammar (vllm-project#13228) Signed-off-by: Russell Bryant <[email protected]> ``` COMMANDS: ```bash git fetch upstream git checkout -b sync-upstream-v0.8.4 git revert 296c657 git merge upstream/v0.8.4 git cherry-pick 7eb4255 ``` TEST PLAN: accept sync ... https://github.com/neuralmagic/nm-cicd/actions/runs/14581880024 release ... https://github.com/neuralmagic/nm-cicd/actions/runs/14596026989 --------- Signed-off-by: Tristan Leclercq <[email protected]> Signed-off-by: yihong0618 <[email protected]> Signed-off-by: reidliu41 <[email protected]> Signed-off-by: Harry Mellor <[email protected]> Signed-off-by: chaunceyjiang <[email protected]> Signed-off-by: Jinzhen Lin <[email protected]> Signed-off-by: Jonghyun Choe <[email protected]> Signed-off-by: Lu Fang <[email protected]> Signed-off-by: Hyesoo Yang <[email protected]> Signed-off-by: Ben Jackson <[email protected]> Signed-off-by: Roger Wang <[email protected]> Signed-off-by: Isotr0py <[email protected]> Signed-off-by: rongfu.leng <[email protected]> Signed-off-by: Varun Sundar Rabindranath <[email protected]> Signed-off-by: paolovic <[email protected]> Signed-off-by: Chengji Yao <[email protected]> Signed-off-by: Kay Yan <[email protected]> Signed-off-by: Woosuk Kwon <[email protected]> Signed-off-by: DarkLight1337 <[email protected]> Signed-off-by: shen-shanshan <[email protected]> Signed-off-by: YamPengLi <[email protected]> Signed-off-by: WangErXiao <[email protected]> Signed-off-by: Aston Zhang <[email protected]> Signed-off-by: Chris Thi <[email protected]> Signed-off-by: drisspg <[email protected]> Signed-off-by: Jon Swenson <[email protected]> Signed-off-by: Keyun Tong <[email protected]> Signed-off-by: Lu Fang <[email protected]> Signed-off-by: Xiaodong Wang <[email protected]> Signed-off-by: Yang Chen <[email protected]> Signed-off-by: Ye (Charlotte) Qi <[email protected]> Signed-off-by: Yong Hoon Shin <[email protected]> Signed-off-by: Zijing Liu <[email protected]> Signed-off-by: Lu Fang <[email protected]> Signed-off-by: Lucia Fang <[email protected]> Signed-off-by: Gregory Shtrasberg <[email protected]> Signed-off-by: NickLucche <[email protected]> Signed-off-by: Benjamin Chislett <[email protected]> Signed-off-by: Nick Hill <[email protected]> Signed-off-by: Leon Seidel <[email protected]> Signed-off-by: mgoin <[email protected]> Signed-off-by: youkaichao <[email protected]> Signed-off-by: Miles Williams <[email protected]> Signed-off-by: mgoin <[email protected]> Signed-off-by: Siyuan Liu <[email protected]> Signed-off-by: Kebe <[email protected]> Signed-off-by: simon-mo <[email protected]> Signed-off-by: Alex-Brooks <[email protected]> Signed-off-by: Tianyuan Wu <[email protected]> Signed-off-by: imkero <[email protected]> Signed-off-by: Lucas Wilkinson <[email protected]> Signed-off-by: Russell Bryant <[email protected]> Signed-off-by: Jee Jee Li <[email protected]> Signed-off-by: Yue <[email protected]> Signed-off-by: tjtanaa <[email protected]> Signed-off-by: kliuae <[email protected]> Signed-off-by: luka <[email protected]> Signed-off-by: lvfei.lv <[email protected]> Signed-off-by: Ajay Vohra <[email protected]> Signed-off-by: Guillaume Calmettes <[email protected]> Signed-off-by: zh Wang <[email protected]> Signed-off-by: Chendi Xue <[email protected]> Signed-off-by: Joe Runde <[email protected]> Signed-off-by: zRzRzRzRzRzRzR <[email protected]> Signed-off-by: Aaron Ang <[email protected]> Signed-off-by: Benjamin Kitor <[email protected]> Signed-off-by: Michael Goin <[email protected]> Signed-off-by: Chenyaaang <[email protected]> Signed-off-by: cyy <[email protected]> Signed-off-by: wineandchord <[email protected]> Signed-off-by: LiuXiaoxuanPKU <[email protected]> Signed-off-by: Chih-Chieh-Yang <[email protected]> Signed-off-by: look <[email protected]> Signed-off-by: jadewang21 <[email protected]> Signed-off-by: alexey-belyakov <[email protected]> Signed-off-by: jiang.li <[email protected]> Signed-off-by: DefTruth <[email protected]> Signed-off-by: chaow <[email protected]> Signed-off-by: Tomasz Zielinski <[email protected]> Signed-off-by: rzou <[email protected]> Signed-off-by: Travis Johnson <[email protected]> Signed-off-by: Christian Sears <[email protected]> Signed-off-by: Gogs <[email protected]> Signed-off-by: Yuan Tang <[email protected]> Signed-off-by: Tianer Zhou <[email protected]> Signed-off-by: [email protected] <[email protected]> Signed-off-by: Jie Fu <[email protected]> Signed-off-by: snowcharm <[email protected]> Signed-off-by: Ryan McConville <[email protected]> Co-authored-by: Tristan Leclercq <[email protected]> Co-authored-by: Kevin H. Luu <[email protected]> Co-authored-by: yihong <[email protected]> Co-authored-by: Reid <[email protected]> Co-authored-by: reidliu41 <[email protected]> Co-authored-by: Harry Mellor <[email protected]> Co-authored-by: Chauncey <[email protected]> Co-authored-by: Jinzhen Lin <[email protected]> Co-authored-by: Jonghyun Choe <[email protected]> Co-authored-by: Lucia Fang <[email protected]> Co-authored-by: Hyesoo Yang <[email protected]> Co-authored-by: Ben Jackson <[email protected]> Co-authored-by: Roger Wang <[email protected]> Co-authored-by: Paul Schweigert <[email protected]> Co-authored-by: Isotr0py <[email protected]> Co-authored-by: rongfu.leng <[email protected]> Co-authored-by: Varun Sundar Rabindranath <[email protected]> Co-authored-by: Varun Sundar Rabindranath <[email protected]> Co-authored-by: paolovic <[email protected]> Co-authored-by: paolovic <[email protected]> Co-authored-by: Chengji Yao <[email protected]> Co-authored-by: Woosuk Kwon <[email protected]> Co-authored-by: Martin Hoyer <[email protected]> Co-authored-by: Kay Yan <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Co-authored-by: Shanshan Shen <[email protected]> Co-authored-by: YamPengLi <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Co-authored-by: Robin <[email protected]> Co-authored-by: Lu Fang <[email protected]> Co-authored-by: Lu Fang <[email protected]> Co-authored-by: Roger Wang <[email protected]> Co-authored-by: Gregory Shtrasberg <[email protected]> Co-authored-by: Nicolò Lucchesi <[email protected]> Co-authored-by: Benjamin Chislett <[email protected]> Co-authored-by: Nick Hill <[email protected]> Co-authored-by: leon-seidel <[email protected]> Co-authored-by: Driss Guessous <[email protected]> Co-authored-by: Michael Goin <[email protected]> Co-authored-by: youkaichao <[email protected]> Co-authored-by: Miles Williams <[email protected]> Co-authored-by: Satyajith Chilappagari <[email protected]> Co-authored-by: mgoin <[email protected]> Co-authored-by: Jennifer Zhao <[email protected]> Co-authored-by: zxfan-cpu <[email protected]> Co-authored-by: Yong Hoon Shin <[email protected]> Co-authored-by: Siyuan Liu <[email protected]> Co-authored-by: Kebe <[email protected]> Co-authored-by: Simon Mo <[email protected]> Co-authored-by: Alex Brooks <[email protected]> Co-authored-by: TY-AMD <[email protected]> Co-authored-by: wang.yuqi <[email protected]> Co-authored-by: Kero Liang <[email protected]> Co-authored-by: Lucas Wilkinson <[email protected]> Co-authored-by: Russell Bryant <[email protected]> Co-authored-by: Jee Jee Li <[email protected]> Co-authored-by: yueshen2016 <[email protected]> Co-authored-by: TJian <[email protected]> Co-authored-by: Hongxia Yang <[email protected]> Co-authored-by: kliuae <[email protected]> Co-authored-by: Luka Govedič <[email protected]> Co-authored-by: Accelerator1996 <[email protected]> Co-authored-by: ajayvohra2005 <[email protected]> Co-authored-by: Guillaume Calmettes <[email protected]> Co-authored-by: zh Wang <[email protected]> Co-authored-by: Chendi.Xue <[email protected]> Co-authored-by: Joe Runde <[email protected]> Co-authored-by: Yuxuan Zhang <[email protected]> Co-authored-by: Aaron Ang <[email protected]> Co-authored-by: Jintao <[email protected]> Co-authored-by: Benjamin Kitor <[email protected]> Co-authored-by: Chenyaaang <[email protected]> Co-authored-by: cyyever <[email protected]> Co-authored-by: Ye (Charlotte) Qi <[email protected]> Co-authored-by: wineandchord <[email protected]> Co-authored-by: Nicolò Lucchesi <[email protected]> Co-authored-by: Lily Liu <[email protected]> Co-authored-by: Chih-Chieh Yang <[email protected]> Co-authored-by: Yu Chin Fabian Lim <[email protected]> Co-authored-by: look <[email protected]> Co-authored-by: WWW <[email protected]> Co-authored-by: Alexey Belyakov <[email protected]> Co-authored-by: Li, Jiang <[email protected]> Co-authored-by: DefTruth <[email protected]> Co-authored-by: chaow-amd <[email protected]> Co-authored-by: Tomasz Zielinski <[email protected]> Co-authored-by: Richard Zou <[email protected]> Co-authored-by: Travis Johnson <[email protected]> Co-authored-by: Kai Wu <[email protected]> Co-authored-by: Christian Sears <[email protected]> Co-authored-by: Gogs <[email protected]> Co-authored-by: Yuan Tang <[email protected]> Co-authored-by: Tianer Zhou <[email protected]> Co-authored-by: Huazhong Ji <[email protected]> Co-authored-by: Jie Fu (傅杰) <[email protected]> Co-authored-by: SnowCharm <[email protected]> Co-authored-by: Ryan McConville <[email protected]> Co-authored-by: andy-neuma <[email protected]>
1 parent 0df24b9 commit 852b1e8

File tree

344 files changed

+11697
-5443
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

344 files changed

+11697
-5443
lines changed
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
# bash .buildkite/lm-eval-harness/run-lm-eval-gsm-vllm-baseline.sh -m nm-testing/Qwen1.5-MoE-A2.7B-Chat-quantized.w4a16 -b auto -l 1319 -f 5 -t 1
2+
model_name: "nm-testing/Qwen1.5-MoE-A2.7B-Chat-quantized.w4a16"
3+
tasks:
4+
- name: "gsm8k"
5+
metrics:
6+
- name: "exact_match,strict-match"
7+
value: 0.31
8+
- name: "exact_match,flexible-extract"
9+
value: 0.47
10+
limit: 1319
11+
num_fewshot: 5

.buildkite/lm-eval-harness/configs/models-small.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ Meta-Llama-3.2-1B-Instruct-INT8-compressed-tensors.yaml
44
Meta-Llama-3-8B-Instruct-INT8-compressed-tensors-asym.yaml
55
Meta-Llama-3-8B-Instruct-nonuniform-compressed-tensors.yaml
66
Meta-Llama-3-8B-Instruct-Channelwise-compressed-tensors.yaml
7-
Minitron-4B-Base-FP8.yaml
7+
Qwen1.5-MoE-W4A16-compressed-tensors.yaml
88
Qwen2-1.5B-Instruct-INT8-compressed-tensors.yaml
99
Qwen2-1.5B-Instruct-FP8W8.yaml
1010
Meta-Llama-3-8B-QQQ.yaml

.buildkite/scripts/run-benchmarks.sh

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,8 @@
55
set -ex
66
set -o pipefail
77

8-
# cd into parent directory of this file
9-
cd "$(dirname "${BASH_SOURCE[0]}")/.."
8+
# cd 2 levels into the working directory
9+
cd "$(dirname "${BASH_SOURCE[0]}")/../.."
1010

1111
(which wget && which curl) || (apt-get update && apt-get install -y wget curl)
1212

.buildkite/test-pipeline.yaml

Lines changed: 14 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -163,11 +163,6 @@ steps:
163163
- tests/tracing
164164
commands:
165165
- pytest -v -s metrics
166-
- "pip install \
167-
'opentelemetry-sdk>=1.26.0,<1.27.0' \
168-
'opentelemetry-api>=1.26.0,<1.27.0' \
169-
'opentelemetry-exporter-otlp>=1.26.0,<1.27.0' \
170-
'opentelemetry-semantic-conventions-ai>=0.4.1,<0.5.0'"
171166
- pytest -v -s tracing
172167

173168
##### fast check tests #####
@@ -292,6 +287,14 @@ steps:
292287
command: pytest -v -s lora --shard-id=$$BUILDKITE_PARALLEL_JOB --num-shards=$$BUILDKITE_PARALLEL_JOB_COUNT --ignore=lora/test_chatglm3_tp.py --ignore=lora/test_llama_tp.py
293288
parallelism: 4
294289

290+
- label: PyTorch Compilation Unit Tests
291+
source_file_dependencies:
292+
- vllm/
293+
- tests/compile
294+
commands:
295+
- pytest -v -s compile/test_pass_manager.py
296+
- pytest -v -s compile/test_fusion.py
297+
295298
- label: PyTorch Fullgraph Smoke Test # 9min
296299
source_file_dependencies:
297300
- vllm/
@@ -301,7 +304,6 @@ steps:
301304
# these tests need to be separated, cannot combine
302305
- pytest -v -s compile/piecewise/test_simple.py
303306
- pytest -v -s compile/piecewise/test_toy_llama.py
304-
- pytest -v -s compile/test_pass_manager.py
305307

306308
- label: PyTorch Fullgraph Test # 18min
307309
source_file_dependencies:
@@ -376,8 +378,10 @@ steps:
376378
source_file_dependencies:
377379
- vllm/
378380
- tests/tool_use
381+
- tests/mistral_tool_use
379382
commands:
380383
- pytest -v -s tool_use
384+
- pytest -v -s mistral_tool_use
381385

382386
##### models test #####
383387

@@ -389,7 +393,8 @@ steps:
389393
- pytest -v -s models/test_transformers.py
390394
- pytest -v -s models/test_registry.py
391395
# V1 Test: https://github.com/vllm-project/vllm/issues/14531
392-
- VLLM_USE_V1=0 pytest -v -s models/test_initialization.py
396+
- VLLM_USE_V1=0 pytest -v -s models/test_initialization.py -k 'not llama4'
397+
- VLLM_USE_V1=0 pytest -v -s models/test_initialization.py -k 'llama4'
393398

394399
- label: Language Models Test (Standard) # 32min
395400
#mirror_hardwares: [amd]
@@ -426,7 +431,7 @@ steps:
426431
- pip install git+https://github.com/TIGER-AI-Lab/Mantis.git
427432
- pytest -v -s models/multimodal
428433
- pytest -v -s models/decoder_only/audio_language -m 'core_model or quant_model'
429-
- pytest -v -s --ignore models/decoder_only/vision_language/test_phi3v.py models/decoder_only/vision_language -m 'core_model or quant_model'
434+
- pytest -v -s models/decoder_only/vision_language -m 'core_model or quant_model'
430435
- pytest -v -s models/embedding/vision_language -m core_model
431436
- pytest -v -s models/encoder_decoder/audio_language -m core_model
432437
- pytest -v -s models/encoder_decoder/language -m core_model
@@ -445,10 +450,7 @@ steps:
445450
- pip install git+https://github.com/TIGER-AI-Lab/Mantis.git
446451
- pytest -v -s models/decoder_only/audio_language -m 'not core_model and not quant_model'
447452
- pytest -v -s models/decoder_only/vision_language/test_models.py -m 'split(group=0) and not core_model and not quant_model'
448-
# HACK - run phi3v tests separately to sidestep this transformers bug
449-
# https://github.com/huggingface/transformers/issues/34307
450-
- pytest -v -s models/decoder_only/vision_language/test_phi3v.py
451-
- pytest -v -s --ignore models/decoder_only/vision_language/test_models.py --ignore models/decoder_only/vision_language/test_phi3v.py models/decoder_only/vision_language -m 'not core_model and not quant_model'
453+
- pytest -v -s --ignore models/decoder_only/vision_language/test_models.py models/decoder_only/vision_language -m 'not core_model and not quant_model'
452454
- pytest -v -s models/embedding/vision_language -m 'not core_model'
453455
- pytest -v -s models/encoder_decoder/language -m 'not core_model'
454456
- pytest -v -s models/encoder_decoder/vision_language -m 'not core_model'

.github/ISSUE_TEMPLATE/600-new-model.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ body:
99
value: >
1010
#### Before submitting an issue, please make sure the issue hasn't been already addressed by searching through [the existing and past issues](https://github.com/vllm-project/vllm/issues?q=is%3Aissue+sort%3Acreated-desc+).
1111
12-
#### We also highly recommend you read https://docs.vllm.ai/en/latest/contributing/model/adding_model.html first to understand how to add a new model.
12+
#### We also highly recommend you read https://docs.vllm.ai/en/latest/contributing/model/index.html first to understand how to add a new model.
1313
- type: textarea
1414
attributes:
1515
label: The model to consider.

.github/PULL_REQUEST_TEMPLATE.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,4 +3,4 @@ FILL IN THE PR DESCRIPTION HERE
33
FIX #xxxx (*link existing issues this PR will resolve*)
44

55
<!--- pyml disable-next-line no-emphasis-as-heading -->
6-
**BEFORE SUBMITTING, PLEASE READ <https://docs.vllm.ai/en/latest/contributing/overview.html>**
6+
**BEFORE SUBMITTING, PLEASE READ <https://docs.vllm.ai/en/latest/contributing/overview.html>** (anything written below this line will be removed by GitHub Actions)

.pre-commit-config.yaml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -122,6 +122,12 @@ repos:
122122
language: system
123123
always_run: true
124124
pass_filenames: false
125+
- id: update-dockerfile-graph
126+
name: Update Dockerfile dependency graph
127+
entry: tools/update-dockerfile-graph.sh
128+
language: script
129+
files: ^docker/Dockerfile$
130+
pass_filenames: false
125131
# Keep `suggestion` last
126132
- id: suggestion
127133
name: Suggestion

CMakeLists.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -230,6 +230,7 @@ set(VLLM_EXT_SRC
230230
"csrc/cache_kernels.cu"
231231
"csrc/attention/paged_attention_v1.cu"
232232
"csrc/attention/paged_attention_v2.cu"
233+
"csrc/attention/merge_attn_states.cu"
233234
"csrc/pos_encoding_kernels.cu"
234235
"csrc/activation_kernels.cu"
235236
"csrc/layernorm_kernels.cu"

README.md

Lines changed: 2 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -10,16 +10,13 @@ Easy, fast, and cheap LLM serving for everyone
1010
</h3>
1111

1212
<p align="center">
13-
| <a href="https://docs.vllm.ai"><b>Documentation</b></a> | <a href="https://vllm.ai"><b>Blog</b></a> | <a href="https://arxiv.org/abs/2309.06180"><b>Paper</b></a> | <a href="https://x.com/vllm_project"><b>Twitter/X</b></a> | <a href="https://discuss.vllm.ai"><b>User Forum</b></a> | <a href="https://slack.vllm.ai"><b>Developer Slack</b></a> |
13+
| <a href="https://docs.vllm.ai"><b>Documentation</b></a> | <a href="https://blog.vllm.ai/"><b>Blog</b></a> | <a href="https://arxiv.org/abs/2309.06180"><b>Paper</b></a> | <a href="https://x.com/vllm_project"><b>Twitter/X</b></a> | <a href="https://discuss.vllm.ai"><b>User Forum</b></a> | <a href="https://slack.vllm.ai"><b>Developer Slack</b></a> |
1414
</p>
1515

1616
---
1717

18-
[2025/04] We're hosting our first-ever *vLLM Asia Developer Day* in Singapore on *April 3rd*! This is a full-day event (9 AM - 9 PM SGT) in partnership with SGInnovate, AMD, and Embedded LLM. Meet the vLLM team and learn about LLM inference for RL, MI300X, and more! [Register Now](https://www.sginnovate.com/event/limited-availability-morning-evening-slots-remaining-inaugural-vllm-asia-developer-day)
19-
20-
---
21-
2218
*Latest News* 🔥
19+
- [2025/04] We hosted [Asia Developer Day](https://www.sginnovate.com/event/limited-availability-morning-evening-slots-remaining-inaugural-vllm-asia-developer-day)! Please find the meetup slides from the vLLM team [here](https://docs.google.com/presentation/d/19cp6Qu8u48ihB91A064XfaXruNYiBOUKrBxAmDOllOo/edit?usp=sharing).
2320
- [2025/03] We hosted [vLLM x Ollama Inference Night](https://lu.ma/vllm-ollama)! Please find the meetup slides from the vLLM team [here](https://docs.google.com/presentation/d/16T2PDD1YwRnZ4Tu8Q5r6n53c5Lr5c73UV9Vd2_eBo4U/edit?usp=sharing).
2421
- [2025/03] We hosted [the first vLLM China Meetup](https://mp.weixin.qq.com/s/n77GibL2corAtQHtVEAzfg)! Please find the meetup slides from vLLM team [here](https://docs.google.com/presentation/d/1REHvfQMKGnvz6p3Fd23HhSO4c8j5WPGZV0bKYLwnHyQ/edit?usp=sharing).
2522
- [2025/03] We hosted [the East Coast vLLM Meetup](https://lu.ma/7mu4k4xx)! Please find the meetup slides [here](https://docs.google.com/presentation/d/1NHiv8EUFF1NLd3fEYODm56nDmL26lEeXCaDgyDlTsRs/edit#slide=id.g31441846c39_0_0).

benchmarks/README.md

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -204,6 +204,24 @@ python3 vllm/benchmarks/benchmark_serving.py \
204204
--seed 42
205205
```
206206

207+
### Running With Sampling Parameters
208+
209+
When using OpenAI-compatible backends such as `vllm`, optional sampling
210+
parameters can be specified. Example client command:
211+
212+
```bash
213+
python3 vllm/benchmarks/benchmark_serving.py \
214+
--backend vllm \
215+
--model NousResearch/Hermes-3-Llama-3.1-8B \
216+
--endpoint /v1/completions \
217+
--dataset-name sharegpt \
218+
--dataset-path <your data path>/ShareGPT_V3_unfiltered_cleaned_split.json \
219+
--top-k 10 \
220+
--top-p 0.9 \
221+
--temperature 0.5 \
222+
--num-prompts 10
223+
```
224+
207225
---
208226
## Example - Offline Throughput Benchmark
209227

0 commit comments

Comments
 (0)