Skip to content

Commit e24033c

Browse files
committed
merge main and resolve conflicts
Signed-off-by: dalthecow <[email protected]>
1 parent 46b5e87 commit e24033c

23 files changed

+207
-83
lines changed

.github/workflows/container-maintenance.yml

Lines changed: 12 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -9,19 +9,22 @@ on:
99
concurrency:
1010
group: ${{ github.workflow }}
1111

12+
permissions:
13+
packages: write
14+
1215
jobs:
1316
cleanup-container-tags:
1417
runs-on: ubuntu-latest
1518
steps:
1619
- name: Delete PR and untagged images older than 2 weeks
1720
uses: snok/[email protected]
1821
with:
19-
account: ${{ github.actor }}
22+
account: ${{ github.repository_owner }}
2023
token: ${{ github.token }}
2124
image-names: ${{ github.event.repository.name }}
2225
image-tags: "pr-*"
2326
cut-off: 2w
24-
dry-run: true
27+
dry-run: false
2528

2629
push-container-tags:
2730
runs-on: ubuntu-latest
@@ -31,19 +34,20 @@ jobs:
3134
- name: Log into ghcr.io
3235
uses: redhat-actions/podman-login@v1
3336
with:
34-
username: ${{ github.actor }}
37+
username: ${{ github.repository_owner }}
3538
password: ${{ github.token }}
3639
registry: ghcr.io/${{ github.repository_owner }}
3740
- name: Get list of tags
3841
run: |
39-
skopeo list-tags docker://${{ github.repository }} | jq --raw-output '.Tags[]' > tags
42+
set -euo pipefail # Fail pipe if any command fails
43+
skopeo list-tags docker://ghcr.io/${{ github.repository }} | jq --raw-output '.Tags[]' > tags
4044
- name: Get latest release and rc tags
4145
run: |
4246
STABLE_TAG="$(grep -P '^v\d+\.\d+\.\d+$' tags | sort -rV | head -n1)"
43-
echo "STABLE_TAG=${STABLE_TAG:-v0.0.0}" >> $GITHUB_ENV
47+
echo "stable_tag=${STABLE_TAG:-v0.0.0}" >> $GITHUB_ENV
4448
LATEST_TAG="$(grep -P '^v\d+\.\d+\.\d+' tags | sort -rV | head -n1)"
45-
echo "LATEST_TAG=${LATEST_TAG:-v0.0.0}" >> $GITHUB_ENV
49+
echo "latest_tag=${LATEST_TAG:-v0.0.0}" >> $GITHUB_ENV
4650
- name: Update latest and stable tags
4751
run: |
48-
skopeo copy docker://${{ github.repository }}:${{ env.stable_tag }} docker://${{ github.repository }}:stable
49-
skopeo copy docker://${{ github.repository }}:${{ env.latest_tag }} docker://${{ github.repository }}:latest
52+
skopeo copy docker://ghcr.io/${{ github.repository }}:${{ env.stable_tag }} docker://ghcr.io/${{ github.repository }}:stable
53+
skopeo copy docker://ghcr.io/${{ github.repository }}:${{ env.latest_tag }} docker://ghcr.io/${{ github.repository }}:latest

.github/workflows/release-candidate.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -228,7 +228,7 @@ jobs:
228228
uses: peaceiris/actions-gh-pages@v3
229229
with:
230230
github_token: ${{ secrets.GITHUB_TOKEN }}
231-
publish_dir: ./ui/out
231+
publish_dir: .src/ui/out
232232
destination_dir: ui/release/${TAG}
233233
keep_files: false
234234
user_name: ${{ github.actor }}

.github/workflows/release.yml

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -227,7 +227,7 @@ jobs:
227227
uses: peaceiris/actions-gh-pages@v3
228228
with:
229229
github_token: ${{ secrets.GITHUB_TOKEN }}
230-
publish_dir: ./ui/out
230+
publish_dir: ./src/ui/out
231231
destination_dir: ui/${TAG}
232232
keep_files: false
233233
user_name: ${{ github.actor }}
@@ -297,7 +297,10 @@ jobs:
297297
with:
298298
fetch-depth: 0
299299
- name: Get version from branch
300-
run: echo "PACKAGE_VERSION=${GITHUB_REF#refs/*/}" >> $GITHUB_ENV
300+
run: |
301+
GITHUB_REF="${{ github.ref }}"
302+
[[ -z "$GITHUB_REF" ]] && exit 1 # Fail if ref is unset
303+
echo "PACKAGE_VERSION=${GITHUB_REF#refs/*/}" >> $GITHUB_ENV
301304
- name: Buildah build
302305
id: build-image
303306
uses: redhat-actions/buildah-build@v2

docs/assets/sample-output1.png

324 KB
Loading

docs/assets/sample-output2.png

298 KB
Loading

docs/assets/sample-output3.png

186 KB
Loading
Lines changed: 117 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,117 @@
1+
# GuideLLM Benchmark Testing Best Practice
2+
3+
Do first easy-go guidellm benchmark testing from scratch using vLLM Simulator.
4+
5+
## Getting Started
6+
7+
### 📦 1. Benchmark Testing Environment Setup
8+
9+
#### 1.1 Create a Conda Environment (recommended)
10+
11+
```bash
12+
conda create -n guidellm-bench python=3.11 -y
13+
conda activate guidellm-bench
14+
```
15+
16+
#### 1.2 Install Dependencies
17+
18+
```bash
19+
git clone https://github.com/vllm-project/guidellm.git
20+
cd guidellm
21+
pip install guidellm
22+
```
23+
24+
For more detailed instructions, refer to [GuideLLM README](https://github.com/vllm-project/guidellm/blob/main/README.md).
25+
26+
#### 1.3 Verify Installation
27+
28+
```bash
29+
guidellm --help
30+
```
31+
32+
#### 1.4 Startup OpenAI-compatible API in vLLM simulator docker container
33+
34+
```bash
35+
docker pull ghcr.io/llm-d/llm-d-inference-sim:v0.4.0
36+
37+
docker run --rm --publish 8000:8000 \
38+
ghcr.io/llm-d/llm-d-inference-sim:v0.4.0 \
39+
--port 8000 \
40+
--model "Qwen/Qwen2.5-1.5B-Instruct" \
41+
--lora-modules '{"name":"tweet-summary-0"}' '{"name":"tweet-summary-1"}'
42+
```
43+
44+
For more detailed instructions, refer to: [vLLM Simulator](https://llm-d.ai/docs/architecture/Components/inference-sim)
45+
46+
Docker image versions: [Docker Images](https://github.com/llm-d/llm-d-inference-sim/pkgs/container/llm-d-inference-sim)
47+
48+
Check open-ai api working via curl:
49+
50+
- check /v1/models
51+
52+
```bash
53+
curl --request GET 'http://localhost:8000/v1/models'
54+
```
55+
56+
- check /v1/chat/completions
57+
58+
```bash
59+
curl --request POST 'http://localhost:8000/v1/chat/completions' \
60+
--header 'Content-Type: application/json' \
61+
--data-raw '{
62+
"model": "tweet-summary-0",
63+
"stream": false,
64+
"messages": [{"role": "user", "content": "Say this is a test!"}]
65+
}'
66+
```
67+
68+
- check /v1/completions
69+
70+
```bash
71+
curl --request POST 'http://localhost:8000/v1/completions' \
72+
--header 'Content-Type: application/json' \
73+
--data-raw '{
74+
"model": "tweet-summary-0",
75+
"stream": false,
76+
"prompt": "Say this is a test!",
77+
"max_tokens": 128
78+
}'
79+
```
80+
81+
#### 1.5 Download Tokenizer
82+
83+
Download Qwen/Qwen2.5-1.5B-Instruct tokenizer files from [Qwen/Qwen2.5-1.5B-Instruct](https://modelscope.cn/models/Qwen/Qwen2.5-1.5B-Instruct/files) save to local path such as ${local_path}/Qwen2.5-1.5B-Instruct
84+
85+
```bash
86+
ls ./Qwen2.5-1.5B-Instruct
87+
merges.txt tokenizer.json tokenizer_config.json vocab.json
88+
```
89+
90+
______________________________________________________________________
91+
92+
## 🚀 2. Running Benchmarks
93+
94+
```bash
95+
guidellm benchmark \
96+
--target "http://localhost:8000/" \
97+
--model "tweet-summary-0" \
98+
--processor "${local_path}/Qwen2.5-1.5B-Instruct" \
99+
--rate-type sweep \
100+
--max-seconds 10 \
101+
--max-requests 10 \
102+
--data "prompt_tokens=128,output_tokens=56"
103+
```
104+
105+
______________________________________________________________________
106+
107+
## 📊 3. Results Interpretation
108+
109+
![alt text](../assets/sample-output1.png) ![alt text](../assets/sample-output2.png) ![alt text](../assets/sample-output3.png)
110+
111+
After the benchmark completes, key results are clear and straightforward, such as:
112+
113+
- **`TTFT`**: Time to First Token
114+
- **`TPOT`**: Time Per Output Token
115+
- **`ITL`**: Inter-Token Latency
116+
117+
The first benchmark test complete.

src/guidellm/presentation/data_models.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -208,7 +208,7 @@ def from_distribution_summary(
208208

209209
class BenchmarkDatum(BaseModel):
210210
requests_per_second: float
211-
tpot: TabularDistributionSummary
211+
itl: TabularDistributionSummary
212212
ttft: TabularDistributionSummary
213213
throughput: TabularDistributionSummary
214214
time_per_request: TabularDistributionSummary
@@ -217,7 +217,7 @@ class BenchmarkDatum(BaseModel):
217217
def from_benchmark(cls, bm: "GenerativeBenchmark"):
218218
return cls(
219219
requests_per_second=bm.metrics.requests_per_second.successful.mean,
220-
tpot=TabularDistributionSummary.from_distribution_summary(
220+
itl=TabularDistributionSummary.from_distribution_summary(
221221
bm.metrics.inter_token_latency_ms.successful
222222
),
223223
ttft=TabularDistributionSummary.from_distribution_summary(

src/guidellm/settings.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -32,8 +32,8 @@ class Environment(str, Enum):
3232

3333

3434
ENV_REPORT_MAPPING = {
35-
Environment.PROD: "https://blog.vllm.ai/guidellm/ui/latest/index.html",
36-
Environment.STAGING: "https://blog.vllm.ai/guidellm/ui/release/latest/index.html",
35+
Environment.PROD: "https://blog.vllm.ai/guidellm/ui/v0.3.0/index.html",
36+
Environment.STAGING: "https://blog.vllm.ai/guidellm/ui/release/v0.3.0/index.html",
3737
Environment.DEV: "https://blog.vllm.ai/guidellm/ui/dev/index.html",
3838
Environment.LOCAL: "http://localhost:3000/index.html",
3939
}

src/ui/lib/components/MetricsSummary/MetricsSummary.component.tsx

Lines changed: 15 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -54,15 +54,15 @@ export const Component = () => {
5454

5555
const {
5656
ttft: ttftSLO,
57-
tpot: tpotSLO,
57+
itl: itlSLO,
5858
timePerRequest: timePerRequestSLO,
5959
throughput: throughputSLO,
6060
percentile,
6161
minX,
6262
maxX,
6363
errors,
6464
handleTtft,
65-
handleTpot,
65+
handleItl,
6666
handleTimePerRequest,
6767
handleThroughput,
6868
handlePercentileChange,
@@ -72,8 +72,8 @@ export const Component = () => {
7272
const isTtftMatch = Boolean(
7373
ttftSLO && interpolatedMetricData.ttft.enforcedPercentileValue <= ttftSLO
7474
);
75-
const isTpotMatch = Boolean(
76-
tpotSLO && interpolatedMetricData.tpot.enforcedPercentileValue <= tpotSLO
75+
const isItlMatch = Boolean(
76+
itlSLO && interpolatedMetricData.itl.enforcedPercentileValue <= itlSLO
7777
);
7878
const isTprMatch = Boolean(
7979
timePerRequestSLO &&
@@ -123,7 +123,7 @@ export const Component = () => {
123123
<FieldsContainer data-id="fields-container">
124124
<FieldCell data-id="field-cell-1">
125125
<Input
126-
label="TTFT (ms)"
126+
label="TIME TO FIRST TOKEN (ms)"
127127
value={ttftSLO}
128128
onChange={handleTtft}
129129
fullWidth
@@ -133,12 +133,12 @@ export const Component = () => {
133133
</FieldCell>
134134
<FieldCell data-id="field-cell-2">
135135
<Input
136-
label="TPOT (ms)"
137-
value={tpotSLO}
138-
onChange={handleTpot}
136+
label="INTER-TOKEN LATENCY (ms)"
137+
value={itlSLO}
138+
onChange={handleItl}
139139
fullWidth
140140
fontColor={LineColor.Secondary}
141-
error={errors?.tpot}
141+
error={errors?.itl}
142142
/>
143143
</FieldCell>
144144
<FieldCell data-id="field-cell-3">
@@ -212,7 +212,7 @@ export const Component = () => {
212212
</MiddleColumn>
213213
<MiddleColumn item xs={3}>
214214
<MetricValue
215-
label="TTFT"
215+
label="time to first token"
216216
value={`${formatNumber(interpolatedMetricData.ttft.enforcedPercentileValue)} ms`}
217217
match={isTtftMatch}
218218
valueColor={LineColor.Primary}
@@ -222,17 +222,17 @@ export const Component = () => {
222222
<MiddleColumn sx={{ paddingLeft: '0px !important' }} item xs={9}>
223223
<GraphContainer>
224224
<MetricLine
225-
data={[{ id: 'tpot', data: lineDataByRps.tpot || [] }]}
226-
threshold={tpotSLO}
225+
data={[{ id: 'itl', data: lineDataByRps.itl || [] }]}
226+
threshold={itlSLO}
227227
lineColor={LineColor.Secondary}
228228
/>
229229
</GraphContainer>
230230
</MiddleColumn>
231231
<MiddleColumn item xs={3}>
232232
<MetricValue
233-
label="TPOT"
234-
value={`${formatNumber(interpolatedMetricData.tpot.enforcedPercentileValue)} ms`}
235-
match={isTpotMatch}
233+
label="inter-token latency"
234+
value={`${formatNumber(interpolatedMetricData.itl.enforcedPercentileValue)} ms`}
235+
match={isItlMatch}
236236
valueColor={LineColor.Secondary}
237237
/>
238238
</MiddleColumn>

0 commit comments

Comments
 (0)