Skip to content

Commit 6300876

Browse files
committed
first practice example
Signed-off-by: guangli.bao <[email protected]>
1 parent 0e3966b commit 6300876

File tree

6 files changed

+85
-0
lines changed

6 files changed

+85
-0
lines changed
Lines changed: 85 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,85 @@
1+
# GuideLLM Benchmark Testing Best Practices
2+
3+
[https://console.d.run/](https://console.d.run/) is one AI infrastructure platform in which there are some deployed model playgrounds. Now, one `guidellm` benchmark testing practice example will be based on one chat model.
4+
5+
## Getting Started
6+
7+
### 📦 1. Benchmark Testing Environment Setup
8+
9+
#### 1.1 Create a Conda Environment (recommended)
10+
11+
```bash
12+
conda create -n guidellm-bench python=3.11 -y
13+
conda activate guidellm-bench
14+
```
15+
16+
#### 1.2 Install Dependencies
17+
18+
```bash
19+
git clone https://github.com/vllm-project/guidellm.git
20+
cd guidellm
21+
pip install guidellm
22+
```
23+
24+
For more detailed instructions, refer to [GuideLLM README](https://github.com/vllm-project/guidellm/blob/main/README.md).
25+
26+
#### 1.3 Verify Installation
27+
28+
```bash
29+
guidellm --help
30+
```
31+
32+
#### 1.4 Apply for Account and API Key in D.run
33+
34+
Firstly, register an account, refer to [D.run Registration](https://docs.d.run/en/#register-account); then, create an API key, refer to [D.run API Key](https://docs.d.run/en/#register-account); finally, charge your account at [D.run Account Management](https://docs.d.run/en/#register-account).
35+
36+
#### 1.5 Chat with Model in D.run
37+
38+
Check if you can use the chat model in D.run.
39+
40+
![alt text](image.png)
41+
42+
#### 1.6 Find Out the HTTP Request URL and Body
43+
44+
Use the Developer Tool in Chrome browser or press F12 to open Network, then chat with the LLM model to capture the HTTP request URL and body.
45+
46+
![alt text](image-1.png)
47+
48+
![alt text](image-2.png)
49+
50+
In this request, the vllm backend service URL is `https://chat.d.run`; vllm model is `public/qwen2.5-72b-instruct-awq`. These two pieces of information will be used in the following benchmark command.
51+
52+
#### 1.7 Download a Chat Dataset from Modelscope
53+
54+
Download the chat dataset JSON file `Open-Source-Meeseeks-high-quality.json` from [Modelscope - Meeseeks](https://modelscope.cn/datasets/meituan/Meeseeks/files).
55+
56+
![alt text](image-3.png)
57+
58+
---
59+
60+
## 🚀 2. Running Benchmarks
61+
62+
```bash
63+
export GUIDELLM__OPENAI__API_KEY="${api_key}"
64+
guidellm benchmark \
65+
--target "https://chat.d.run/" \
66+
--model "public/qwen2.5-72b-awq" \
67+
--rate-type "throughput" \
68+
--data-args '{"prompt_column": "prompt", "split": "train"}' \
69+
--max-requests 10 \
70+
--data "/{$local_path}/Open-Source-Meeseeks-high-quality.json"
71+
```
72+
73+
---
74+
75+
## 📊 3. Results Interpretation
76+
77+
![alt text](image-4.png)
78+
79+
After the benchmark completes, key results are clear and straightforward, such as:
80+
81+
* **`TTFT`**: Time to First Token
82+
* **`TPOT`**: Time Per Output Token
83+
* **`ITL`**: Inter-Token Latency
84+
85+
The first benchmark test is complete.

docs/examples/image-1.png

311 KB
Loading

docs/examples/image-2.png

337 KB
Loading

docs/examples/image-3.png

755 KB
Loading

docs/examples/image-4.png

361 KB
Loading

docs/examples/image.png

315 KB
Loading

0 commit comments

Comments
 (0)