@@ -93,8 +93,8 @@ vllm bench serve \
93
93
--port 8000 \
94
94
--model OpenGVLab/InternVL3-8B-hf \
95
95
--dataset-name random \
96
- --random-input 2048 \
97
- --random-output 1024 \
96
+ --random-input-len 2048 \
97
+ --random-output-len 1024 \
98
98
--max-concurrency 10 \
99
99
--num-prompts 50 \
100
100
--ignore-eos
@@ -103,24 +103,26 @@ If it works successfully, you will see the following output.
103
103
104
104
```
105
105
============ Serving Benchmark Result ============
106
- Successful requests: 497
107
- Benchmark duration (s): 229.42
108
- Total input tokens: 507680
109
- Total generated tokens: 62259
110
- Request throughput (req/s): 2.17
111
- Output token throughput (tok/s): 271.37
112
- Total Token throughput (tok/s): 2484.22
106
+ Successful requests: 50
107
+ Maximum request concurrency: 10
108
+ Benchmark duration (s): 247.46
109
+ Total input tokens: 101987
110
+ Total generated tokens: 51200
111
+ Request throughput (req/s): 0.20
112
+ Output token throughput (tok/s): 206.90
113
+ Total Token throughput (tok/s): 619.04
113
114
---------------Time to First Token----------------
114
- Mean TTFT (ms): 102429.40
115
- Median TTFT (ms): 99644.38
116
- P99 TTFT (ms): 213820.81
115
+ Mean TTFT (ms): 932.11
116
+ Median TTFT (ms): 854.60
117
+ P99 TTFT (ms): 1845.91
117
118
-----Time per Output Token (excl. 1st token)------
118
- Mean TPOT (ms): 664.26
119
- Median TPOT (ms): 776.39
120
- P99 TPOT (ms): 848.52
119
+ Mean TPOT (ms): 47.44
120
+ Median TPOT (ms): 47.53
121
+ P99 TPOT (ms): 48.26
121
122
---------------Inter-token Latency----------------
122
- Mean ITL (ms): 661.73
123
- Median ITL (ms): 844.15
124
- P99 ITL (ms): 856.42
123
+ Mean ITL (ms): 47.44
124
+ Median ITL (ms): 46.14
125
+ P99 ITL (ms): 54.76
125
126
==================================================
127
+
126
128
```
0 commit comments