Skip to content

Commit 5c9982a

Browse files
authored
first benchark testing example (#328)
## Summary <!-- Include a short paragraph of the changes introduced in this PR. If this PR requires additional context or rationale, explain why the changes are necessary. --> <img width="1757" height="1212" alt="image" src="https://github.com/user-attachments/assets/fbfddeac-ca56-40c0-b7ae-d2f17d50823a" /> ## Details <!-- Provide a detailed list of all changes introduced in this pull request. --> - [ ] ## Test Plan <!-- List the steps needed to test this PR. --> - ## Related Issues <!-- Link any relevant issues that this PR addresses. --> - Resolves # --- - [ ] "I certify that all code in this PR is my own, except as noted below." ## Use of AI - [ ] Includes AI-assisted code completion - [ ] Includes code generated by an AI application - [ ] Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes `## WRITTEN BY AI ##`)
2 parents ad25e06 + 2c0d993 commit 5c9982a

File tree

4 files changed

+117
-0
lines changed

4 files changed

+117
-0
lines changed

docs/assets/sample-output1.png

324 KB
Loading

docs/assets/sample-output2.png

298 KB
Loading

docs/assets/sample-output3.png

186 KB
Loading
Lines changed: 117 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,117 @@
1+
# GuideLLM Benchmark Testing Best Practice
2+
3+
Do first easy-go guidellm benchmark testing from scratch using vLLM Simulator.
4+
5+
## Getting Started
6+
7+
### 📦 1. Benchmark Testing Environment Setup
8+
9+
#### 1.1 Create a Conda Environment (recommended)
10+
11+
```bash
12+
conda create -n guidellm-bench python=3.11 -y
13+
conda activate guidellm-bench
14+
```
15+
16+
#### 1.2 Install Dependencies
17+
18+
```bash
19+
git clone https://github.com/vllm-project/guidellm.git
20+
cd guidellm
21+
pip install guidellm
22+
```
23+
24+
For more detailed instructions, refer to [GuideLLM README](https://github.com/vllm-project/guidellm/blob/main/README.md).
25+
26+
#### 1.3 Verify Installation
27+
28+
```bash
29+
guidellm --help
30+
```
31+
32+
#### 1.4 Startup OpenAI-compatible API in vLLM simulator docker container
33+
34+
```bash
35+
docker pull ghcr.io/llm-d/llm-d-inference-sim:v0.4.0
36+
37+
docker run --rm --publish 8000:8000 \
38+
ghcr.io/llm-d/llm-d-inference-sim:v0.4.0 \
39+
--port 8000 \
40+
--model "Qwen/Qwen2.5-1.5B-Instruct" \
41+
--lora-modules '{"name":"tweet-summary-0"}' '{"name":"tweet-summary-1"}'
42+
```
43+
44+
For more detailed instructions, refer to: [vLLM Simulator](https://llm-d.ai/docs/architecture/Components/inference-sim)
45+
46+
Docker image versions: [Docker Images](https://github.com/llm-d/llm-d-inference-sim/pkgs/container/llm-d-inference-sim)
47+
48+
Check open-ai api working via curl:
49+
50+
- check /v1/models
51+
52+
```bash
53+
curl --request GET 'http://localhost:8000/v1/models'
54+
```
55+
56+
- check /v1/chat/completions
57+
58+
```bash
59+
curl --request POST 'http://localhost:8000/v1/chat/completions' \
60+
--header 'Content-Type: application/json' \
61+
--data-raw '{
62+
"model": "tweet-summary-0",
63+
"stream": false,
64+
"messages": [{"role": "user", "content": "Say this is a test!"}]
65+
}'
66+
```
67+
68+
- check /v1/completions
69+
70+
```bash
71+
curl --request POST 'http://localhost:8000/v1/completions' \
72+
--header 'Content-Type: application/json' \
73+
--data-raw '{
74+
"model": "tweet-summary-0",
75+
"stream": false,
76+
"prompt": "Say this is a test!",
77+
"max_tokens": 128
78+
}'
79+
```
80+
81+
#### 1.5 Download Tokenizer
82+
83+
Download Qwen/Qwen2.5-1.5B-Instruct tokenizer files from [Qwen/Qwen2.5-1.5B-Instruct](https://modelscope.cn/models/Qwen/Qwen2.5-1.5B-Instruct/files) save to local path such as ${local_path}/Qwen2.5-1.5B-Instruct
84+
85+
```bash
86+
ls ./Qwen2.5-1.5B-Instruct
87+
merges.txt tokenizer.json tokenizer_config.json vocab.json
88+
```
89+
90+
______________________________________________________________________
91+
92+
## 🚀 2. Running Benchmarks
93+
94+
```bash
95+
guidellm benchmark \
96+
--target "http://localhost:8000/" \
97+
--model "tweet-summary-0" \
98+
--processor "${local_path}/Qwen2.5-1.5B-Instruct" \
99+
--rate-type sweep \
100+
--max-seconds 10 \
101+
--max-requests 10 \
102+
--data "prompt_tokens=128,output_tokens=56"
103+
```
104+
105+
______________________________________________________________________
106+
107+
## 📊 3. Results Interpretation
108+
109+
![alt text](../assets/sample-output1.png) ![alt text](../assets/sample-output2.png) ![alt text](../assets/sample-output3.png)
110+
111+
After the benchmark completes, key results are clear and straightforward, such as:
112+
113+
- **`TTFT`**: Time to First Token
114+
- **`TPOT`**: Time Per Output Token
115+
- **`ITL`**: Inter-Token Latency
116+
117+
The first benchmark test complete.

0 commit comments

Comments
 (0)