Skip to content

Commit f1b6f55

Browse files
committed
more fixes and scripts
1 parent 6a5c7dd commit f1b6f55

File tree

61 files changed

+16788
-10
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

61 files changed

+16788
-10
lines changed

benchmarks/README.md

Lines changed: 389 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,389 @@
1+
# Benchmarking vLLM
2+
3+
This README guides you through running benchmark tests with the extensive
4+
datasets supported on vLLM. It’s a living document, updated as new features and datasets
5+
become available.
6+
7+
## Dataset Overview
8+
9+
<table style="width:100%; border-collapse: collapse;">
10+
<thead>
11+
<tr>
12+
<th style="width:15%; text-align: left;">Dataset</th>
13+
<th style="width:10%; text-align: center;">Online</th>
14+
<th style="width:10%; text-align: center;">Offline</th>
15+
<th style="width:65%; text-align: left;">Data Path</th>
16+
</tr>
17+
</thead>
18+
<tbody>
19+
<tr>
20+
<td><strong>ShareGPT</strong></td>
21+
<td style="text-align: center;">✅</td>
22+
<td style="text-align: center;">✅</td>
23+
<td><code>wget https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json</code></td>
24+
</tr>
25+
<tr>
26+
<td><strong>BurstGPT</strong></td>
27+
<td style="text-align: center;">✅</td>
28+
<td style="text-align: center;">✅</td>
29+
<td><code>wget https://github.com/HPMLL/BurstGPT/releases/download/v1.1/BurstGPT_without_fails_2.csv</code></td>
30+
</tr>
31+
<tr>
32+
<td><strong>Sonnet</strong></td>
33+
<td style="text-align: center;">✅</td>
34+
<td style="text-align: center;">✅</td>
35+
<td>Local file: <code>benchmarks/sonnet.txt</code></td>
36+
</tr>
37+
<tr>
38+
<td><strong>Random</strong></td>
39+
<td style="text-align: center;">✅</td>
40+
<td style="text-align: center;">✅</td>
41+
<td><code>synthetic</code></td>
42+
</tr>
43+
<tr>
44+
<td><strong>HuggingFace-VisionArena</strong></td>
45+
<td style="text-align: center;">✅</td>
46+
<td style="text-align: center;">✅</td>
47+
<td><code>lmarena-ai/VisionArena-Chat</code></td>
48+
</tr>
49+
<tr>
50+
<td><strong>HuggingFace-InstructCoder</strong></td>
51+
<td style="text-align: center;">✅</td>
52+
<td style="text-align: center;">✅</td>
53+
<td><code>likaixin/InstructCoder</code></td>
54+
</tr>
55+
<tr>
56+
<td><strong>HuggingFace-AIMO</strong></td>
57+
<td style="text-align: center;">✅</td>
58+
<td style="text-align: center;">✅</td>
59+
<td><code>AI-MO/aimo-validation-aime</code> , <code>AI-MO/NuminaMath-1.5</code>, <code>AI-MO/NuminaMath-CoT</code></td>
60+
</tr>
61+
<tr>
62+
<td><strong>HuggingFace-Other</strong></td>
63+
<td style="text-align: center;">✅</td>
64+
<td style="text-align: center;">✅</td>
65+
<td><code>lmms-lab/LLaVA-OneVision-Data</code>, <code>Aeala/ShareGPT_Vicuna_unfiltered</code></td>
66+
</tr>
67+
<tr>
68+
<td><strong>Custom</strong></td>
69+
<td style="text-align: center;">✅</td>
70+
<td style="text-align: center;">✅</td>
71+
<td>Local file: <code>data.jsonl</code></td>
72+
</tr>
73+
</tbody>
74+
</table>
75+
76+
✅: supported
77+
78+
🟡: Partial support
79+
80+
🚧: to be supported
81+
82+
**Note**: HuggingFace dataset's `dataset-name` should be set to `hf`
83+
84+
---
85+
## Example - Online Benchmark
86+
87+
First start serving your model
88+
89+
```bash
90+
vllm serve NousResearch/Hermes-3-Llama-3.1-8B --disable-log-requests
91+
```
92+
93+
Then run the benchmarking script
94+
95+
```bash
96+
# download dataset
97+
# wget https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json
98+
python3 vllm/benchmarks/benchmark_serving.py \
99+
--backend vllm \
100+
--model NousResearch/Hermes-3-Llama-3.1-8B \
101+
--endpoint /v1/completions \
102+
--dataset-name sharegpt \
103+
--dataset-path <your data path>/ShareGPT_V3_unfiltered_cleaned_split.json \
104+
--num-prompts 10
105+
```
106+
107+
If successful, you will see the following output
108+
109+
```
110+
============ Serving Benchmark Result ============
111+
Successful requests: 10
112+
Benchmark duration (s): 5.78
113+
Total input tokens: 1369
114+
Total generated tokens: 2212
115+
Request throughput (req/s): 1.73
116+
Output token throughput (tok/s): 382.89
117+
Total Token throughput (tok/s): 619.85
118+
---------------Time to First Token----------------
119+
Mean TTFT (ms): 71.54
120+
Median TTFT (ms): 73.88
121+
P99 TTFT (ms): 79.49
122+
-----Time per Output Token (excl. 1st token)------
123+
Mean TPOT (ms): 7.91
124+
Median TPOT (ms): 7.96
125+
P99 TPOT (ms): 8.03
126+
---------------Inter-token Latency----------------
127+
Mean ITL (ms): 7.74
128+
Median ITL (ms): 7.70
129+
P99 ITL (ms): 8.39
130+
==================================================
131+
```
132+
133+
### Custom Dataset
134+
If the dataset you want to benchmark is not supported yet in vLLM, even then you can benchmark on it using `CustomDataset`. Your data needs to be in `.jsonl` format and needs to have "prompt" field per entry, e.g., data.jsonl
135+
136+
```
137+
{"prompt": "What is the capital of India?"}
138+
{"prompt": "What is the capital of Iran?"}
139+
{"prompt": "What is the capital of China?"}
140+
```
141+
142+
```bash
143+
# start server
144+
VLLM_USE_V1=1 vllm serve meta-llama/Llama-3.1-8B-Instruct --disable-log-requests
145+
```
146+
147+
```bash
148+
# run benchmarking script
149+
python3 benchmarks/benchmark_serving.py --port 9001 --save-result --save-detailed \
150+
--backend vllm \
151+
--model meta-llama/Llama-3.1-8B-Instruct \
152+
--endpoint /v1/completions \
153+
--dataset-name custom \
154+
--dataset-path <path-to-your-data-jsonl> \
155+
--custom-skip-chat-template \
156+
--num-prompts 80 \
157+
--max-concurrency 1 \
158+
--temperature=0.3 \
159+
--top-p=0.75 \
160+
--result-dir "./log/"
161+
```
162+
163+
You can skip applying chat template if your data already has it by using `--custom-skip-chat-template`.
164+
165+
### VisionArena Benchmark for Vision Language Models
166+
167+
```bash
168+
# need a model with vision capability here
169+
vllm serve Qwen/Qwen2-VL-7B-Instruct --disable-log-requests
170+
```
171+
172+
```bash
173+
python3 vllm/benchmarks/benchmark_serving.py \
174+
--backend openai-chat \
175+
--model Qwen/Qwen2-VL-7B-Instruct \
176+
--endpoint /v1/chat/completions \
177+
--dataset-name hf \
178+
--dataset-path lmarena-ai/VisionArena-Chat \
179+
--hf-split train \
180+
--num-prompts 1000
181+
```
182+
183+
### InstructCoder Benchmark with Speculative Decoding
184+
185+
``` bash
186+
VLLM_USE_V1=1 vllm serve meta-llama/Meta-Llama-3-8B-Instruct \
187+
--speculative-config $'{"method": "ngram",
188+
"num_speculative_tokens": 5, "prompt_lookup_max": 5,
189+
"prompt_lookup_min": 2}'
190+
```
191+
192+
``` bash
193+
python3 benchmarks/benchmark_serving.py \
194+
--model meta-llama/Meta-Llama-3-8B-Instruct \
195+
--dataset-name hf \
196+
--dataset-path likaixin/InstructCoder \
197+
--num-prompts 2048
198+
```
199+
200+
### Other HuggingFaceDataset Examples
201+
202+
```bash
203+
vllm serve Qwen/Qwen2-VL-7B-Instruct --disable-log-requests
204+
```
205+
206+
**`lmms-lab/LLaVA-OneVision-Data`**
207+
208+
```bash
209+
python3 vllm/benchmarks/benchmark_serving.py \
210+
--backend openai-chat \
211+
--model Qwen/Qwen2-VL-7B-Instruct \
212+
--endpoint /v1/chat/completions \
213+
--dataset-name hf \
214+
--dataset-path lmms-lab/LLaVA-OneVision-Data \
215+
--hf-split train \
216+
--hf-subset "chart2text(cauldron)" \
217+
--num-prompts 10
218+
```
219+
220+
**`Aeala/ShareGPT_Vicuna_unfiltered`**
221+
222+
```bash
223+
python3 vllm/benchmarks/benchmark_serving.py \
224+
--backend openai-chat \
225+
--model Qwen/Qwen2-VL-7B-Instruct \
226+
--endpoint /v1/chat/completions \
227+
--dataset-name hf \
228+
--dataset-path Aeala/ShareGPT_Vicuna_unfiltered \
229+
--hf-split train \
230+
--num-prompts 10
231+
```
232+
233+
**`AI-MO/aimo-validation-aime`**
234+
235+
``` bash
236+
python3 vllm/benchmarks/benchmark_serving.py \
237+
--model Qwen/QwQ-32B \
238+
--dataset-name hf \
239+
--dataset-path AI-MO/aimo-validation-aime \
240+
--num-prompts 10 \
241+
--seed 42
242+
```
243+
244+
**`philschmid/mt-bench`**
245+
246+
``` bash
247+
python3 vllm/benchmarks/benchmark_serving.py \
248+
--model Qwen/QwQ-32B \
249+
--dataset-name hf \
250+
--dataset-path philschmid/mt-bench \
251+
--num-prompts 80
252+
```
253+
254+
### Running With Sampling Parameters
255+
256+
When using OpenAI-compatible backends such as `vllm`, optional sampling
257+
parameters can be specified. Example client command:
258+
259+
```bash
260+
python3 vllm/benchmarks/benchmark_serving.py \
261+
--backend vllm \
262+
--model NousResearch/Hermes-3-Llama-3.1-8B \
263+
--endpoint /v1/completions \
264+
--dataset-name sharegpt \
265+
--dataset-path <your data path>/ShareGPT_V3_unfiltered_cleaned_split.json \
266+
--top-k 10 \
267+
--top-p 0.9 \
268+
--temperature 0.5 \
269+
--num-prompts 10
270+
```
271+
272+
---
273+
## Example - Offline Throughput Benchmark
274+
275+
```bash
276+
python3 vllm/benchmarks/benchmark_throughput.py \
277+
--model NousResearch/Hermes-3-Llama-3.1-8B \
278+
--dataset-name sonnet \
279+
--dataset-path vllm/benchmarks/sonnet.txt \
280+
--num-prompts 10
281+
```
282+
283+
If successful, you will see the following output
284+
285+
```
286+
Throughput: 7.15 requests/s, 4656.00 total tokens/s, 1072.15 output tokens/s
287+
Total num prompt tokens: 5014
288+
Total num output tokens: 1500
289+
```
290+
291+
### VisionArena Benchmark for Vision Language Models
292+
293+
``` bash
294+
python3 vllm/benchmarks/benchmark_throughput.py \
295+
--model Qwen/Qwen2-VL-7B-Instruct \
296+
--backend vllm-chat \
297+
--dataset-name hf \
298+
--dataset-path lmarena-ai/VisionArena-Chat \
299+
--num-prompts 1000 \
300+
--hf-split train
301+
```
302+
303+
The `num prompt tokens` now includes image token counts
304+
305+
```
306+
Throughput: 2.55 requests/s, 4036.92 total tokens/s, 326.90 output tokens/s
307+
Total num prompt tokens: 14527
308+
Total num output tokens: 1280
309+
```
310+
311+
### InstructCoder Benchmark with Speculative Decoding
312+
313+
``` bash
314+
VLLM_WORKER_MULTIPROC_METHOD=spawn \
315+
VLLM_USE_V1=1 \
316+
python3 vllm/benchmarks/benchmark_throughput.py \
317+
--dataset-name=hf \
318+
--dataset-path=likaixin/InstructCoder \
319+
--model=meta-llama/Meta-Llama-3-8B-Instruct \
320+
--input-len=1000 \
321+
--output-len=100 \
322+
--num-prompts=2048 \
323+
--async-engine \
324+
--speculative-config $'{"method": "ngram",
325+
"num_speculative_tokens": 5, "prompt_lookup_max": 5,
326+
"prompt_lookup_min": 2}'
327+
```
328+
329+
```
330+
Throughput: 104.77 requests/s, 23836.22 total tokens/s, 10477.10 output tokens/s
331+
Total num prompt tokens: 261136
332+
Total num output tokens: 204800
333+
```
334+
335+
### Other HuggingFaceDataset Examples
336+
337+
**`lmms-lab/LLaVA-OneVision-Data`**
338+
339+
```bash
340+
python3 vllm/benchmarks/benchmark_throughput.py \
341+
--model Qwen/Qwen2-VL-7B-Instruct \
342+
--backend vllm-chat \
343+
--dataset-name hf \
344+
--dataset-path lmms-lab/LLaVA-OneVision-Data \
345+
--hf-split train \
346+
--hf-subset "chart2text(cauldron)" \
347+
--num-prompts 10
348+
```
349+
350+
**`Aeala/ShareGPT_Vicuna_unfiltered`**
351+
352+
```bash
353+
python3 vllm/benchmarks/benchmark_throughput.py \
354+
--model Qwen/Qwen2-VL-7B-Instruct \
355+
--backend vllm-chat \
356+
--dataset-name hf \
357+
--dataset-path Aeala/ShareGPT_Vicuna_unfiltered \
358+
--hf-split train \
359+
--num-prompts 10
360+
```
361+
362+
**`AI-MO/aimo-validation-aime`**
363+
364+
```bash
365+
python3 benchmarks/benchmark_throughput.py \
366+
--model Qwen/QwQ-32B \
367+
--backend vllm \
368+
--dataset-name hf \
369+
--dataset-path AI-MO/aimo-validation-aime \
370+
--hf-split train \
371+
--num-prompts 10
372+
```
373+
374+
### Benchmark with LoRA Adapters
375+
376+
``` bash
377+
# download dataset
378+
# wget https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json
379+
python3 vllm/benchmarks/benchmark_throughput.py \
380+
--model meta-llama/Llama-2-7b-hf \
381+
--backend vllm \
382+
--dataset_path <your data path>/ShareGPT_V3_unfiltered_cleaned_split.json \
383+
--dataset_name sharegpt \
384+
--num-prompts 10 \
385+
--max-loras 2 \
386+
--max-lora-rank 8 \
387+
--enable-lora \
388+
--lora-path yard1/llama-2-7b-sql-lora-test
389+
```

0 commit comments

Comments
 (0)