Skip to content

Commit aaf376b

Browse files
authored
Update InternVL3.md
1 parent e3200d0 commit aaf376b

File tree

1 file changed

+36
-27
lines changed

1 file changed

+36
-27
lines changed

OpenGVLab/InternVL3.md

Lines changed: 36 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -84,45 +84,54 @@ The result would be like this:
8484

8585
### Benchmarking Performance
8686

87-
Take InternVL3-8B-hf as an example:
87+
Take InternVL3-8B-hf as an example, using random multimodal dataset mentioned in [PR:Feature/benchmark/random mm data/images](https://github.com/vllm-project/vllm/pull/23119):
8888

8989
```bash
9090
# need to start vLLM service first
9191
vllm bench serve \
92-
--host 0.0.0.0 \
93-
--port 8000 \
94-
--model OpenGVLab/InternVL3-8B-hf \
95-
--dataset-name random \
96-
--random-input-len 2048 \
97-
--random-output-len 1024 \
98-
--max-concurrency 10 \
99-
--num-prompts 50 \
100-
--ignore-eos
92+
--host 0.0.0.0 \
93+
--port 8000 \
94+
--model OpenGVLab/InternVL3-8B-hf \
95+
--dataset-name random-mm \
96+
--num-prompts 100 \
97+
--max-concurrency 10 \
98+
--random-prefix-len 25 \
99+
--random-input-len 300 \
100+
--random-output-len 40 \
101+
--random-range-ratio 0.2 \
102+
--random-mm-base-items-per-request 0 \
103+
--random-mm-num-mm-items-range-ratio 0 \
104+
--random-mm-limit-mm-per-prompt '{"image":3,"video":0}' \
105+
--random-mm-bucket-config '{(256, 256, 1): 0.25, (720, 1280, 1): 0.75}' \
106+
--request-rate inf \
107+
--ignore-eos \
108+
--endpoint-type openai-chat \
109+
--endpoint "/v1/chat/completions" \
110+
--seed 42
101111
```
102112
If it works successfully, you will see the following output.
103113

104114
```
105115
============ Serving Benchmark Result ============
106-
Successful requests: 50
116+
Successful requests: 100
107117
Maximum request concurrency: 10
108-
Benchmark duration (s): 247.46
109-
Total input tokens: 101987
110-
Total generated tokens: 51200
111-
Request throughput (req/s): 0.20
112-
Output token throughput (tok/s): 206.90
113-
Total Token throughput (tok/s): 619.04
118+
Benchmark duration (s): 24.54
119+
Total input tokens: 32805
120+
Total generated tokens: 3982
121+
Request throughput (req/s): 4.07
122+
Output token throughput (tok/s): 162.25
123+
Total Token throughput (tok/s): 1498.91
114124
---------------Time to First Token----------------
115-
Mean TTFT (ms): 932.11
116-
Median TTFT (ms): 854.60
117-
P99 TTFT (ms): 1845.91
125+
Mean TTFT (ms): 198.18
126+
Median TTFT (ms): 158.99
127+
P99 TTFT (ms): 524.05
118128
-----Time per Output Token (excl. 1st token)------
119-
Mean TPOT (ms): 47.44
120-
Median TPOT (ms): 47.53
121-
P99 TPOT (ms): 48.26
129+
Mean TPOT (ms): 55.56
130+
Median TPOT (ms): 56.04
131+
P99 TPOT (ms): 60.32
122132
---------------Inter-token Latency----------------
123-
Mean ITL (ms): 47.44
124-
Median ITL (ms): 46.14
125-
P99 ITL (ms): 54.76
133+
Mean ITL (ms): 54.22
134+
Median ITL (ms): 47.02
135+
P99 ITL (ms): 116.90
126136
==================================================
127-
128137
```

0 commit comments

Comments
 (0)