Skip to content

Commit fa831c5

Browse files
committed
Resolve conflicts
Signed-off-by: Thomas Parnell <[email protected]>
2 parents c698db3 + bfc1edc commit fa831c5

File tree

873 files changed

+48394
-26643
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

873 files changed

+48394
-26643
lines changed

.buildkite/generate_index.py

Lines changed: 21 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,8 @@
88
<html>
99
<body>
1010
<h1>Links for vLLM</h1/>
11-
<a href="../{wheel_html_escaped}">{wheel}</a><br/>
11+
<a href="../{x86_wheel_html_escaped}">{x86_wheel}</a><br/>
12+
<a href="../{arm_wheel_html_escaped}">{arm_wheel}</a><br/>
1213
</body>
1314
</html>
1415
"""
@@ -21,7 +22,25 @@
2122

2223
with open("index.html", "w") as f:
2324
print(f"Generated index.html for {args.wheel}")
25+
# sync the abi tag with .buildkite/scripts/upload-wheels.sh
26+
if "x86_64" in filename:
27+
x86_wheel = filename
28+
arm_wheel = filename.replace("x86_64", "aarch64").replace(
29+
"manylinux1", "manylinux2014"
30+
)
31+
elif "aarch64" in filename:
32+
x86_wheel = filename.replace("aarch64", "x86_64").replace(
33+
"manylinux2014", "manylinux1"
34+
)
35+
arm_wheel = filename
36+
else:
37+
raise ValueError(f"Unsupported wheel: {filename}")
2438
# cloudfront requires escaping the '+' character
2539
f.write(
26-
template.format(wheel=filename, wheel_html_escaped=filename.replace("+", "%2B"))
40+
template.format(
41+
x86_wheel=x86_wheel,
42+
x86_wheel_html_escaped=x86_wheel.replace("+", "%2B"),
43+
arm_wheel=arm_wheel,
44+
arm_wheel_html_escaped=arm_wheel.replace("+", "%2B"),
45+
)
2746
)

.buildkite/lm-eval-harness/configs/Meta-Llama-3-8B-QQQ.yaml

Lines changed: 0 additions & 12 deletions
This file was deleted.

.buildkite/lm-eval-harness/configs/models-large.txt

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,4 +3,3 @@ Meta-Llama-3-70B-Instruct.yaml
33
Mixtral-8x7B-Instruct-v0.1.yaml
44
Qwen2-57B-A14-Instruct.yaml
55
DeepSeek-V2-Lite-Chat.yaml
6-
Meta-Llama-3-8B-QQQ.yaml

.buildkite/lm-eval-harness/run-lm-eval-gsm-hf-baseline.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
# We can use this script to compute baseline accuracy on GSM for transformers.
33
#
44
# Make sure you have lm-eval-harness installed:
5-
# pip install lm-eval==0.4.4
5+
# pip install git+https://github.com/EleutherAI/lm-evaluation-harness.git@206b7722158f58c35b7ffcd53b035fdbdda5126d#egg=lm-eval[api]
66

77
usage() {
88
echo``

.buildkite/lm-eval-harness/run-lm-eval-gsm-vllm-baseline.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
# We use this for fp8, which HF does not support.
44
#
55
# Make sure you have lm-eval-harness installed:
6-
# pip install lm-eval==0.4.4
6+
# pip install git+https://github.com/EleutherAI/lm-evaluation-harness.git@206b7722158f58c35b7ffcd53b035fdbdda5126d#egg=lm-eval[api]
77

88
usage() {
99
echo``

.buildkite/nightly-benchmarks/README.md

Lines changed: 12 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ This directory contains two sets of benchmark for vllm.
77
- Performance benchmark: benchmark vllm's performance under various workload, for **developers** to gain clarity on whether their PR improves/degrades vllm's performance
88
- Nightly benchmark: compare vllm's performance against alternatives (tgi, trt-llm and lmdeploy), for **the public** to know when to choose vllm.
99

10-
See [vLLM performance dashboard](https://perf.vllm.ai) for the latest performance benchmark results and [vLLM GitHub README](https://github.com/vllm-project/vllm/blob/main/README.md) for latest nightly benchmark results.
10+
See [vLLM performance dashboard](https://hud.pytorch.org/benchmark/llms?repoName=vllm-project%2Fvllm) for the latest performance benchmark results and [vLLM GitHub README](https://github.com/vllm-project/vllm/blob/main/README.md) for latest nightly benchmark results.
1111

1212
## Performance benchmark quick overview
1313

@@ -138,28 +138,20 @@ The raw benchmarking results (in the format of json files) are in the `Artifacts
138138

139139
The `compare-json-results.py` helps to compare benchmark results JSON files converted using `convert-results-json-to-markdown.py`.
140140
When run, benchmark script generates results under `benchmark/results` folder, along with the `benchmark_results.md` and `benchmark_results.json`.
141-
`compare-json-results.py` compares two `benchmark_results.json` files and provides performance ratio e.g. for Output Tput, Median TTFT and Median TPOT.
141+
`compare-json-results.py` compares two `benchmark_results.json` files and provides performance ratio e.g. for Output Tput, Median TTFT and Median TPOT.
142+
If only one benchmark_results.json is passed, `compare-json-results.py` compares different TP and PP configurations in the benchmark_results.json instead.
142143

143-
Here is an example using the script to compare result_a and result_b without detail test name.
144-
`python3 compare-json-results.py -f results_a/benchmark_results.json -f results_b/benchmark_results.json --ignore_test_name`
145-
146-
| | results_a/benchmark_results.json | results_b/benchmark_results.json | perf_ratio |
147-
|----|----------------------------------------|----------------------------------------|----------|
148-
| 0 | 142.633982 | 156.526018 | 1.097396 |
149-
| 1 | 241.620334 | 294.018783 | 1.216863 |
150-
| 2 | 218.298905 | 262.664916 | 1.203235 |
151-
| 3 | 242.743860 | 299.816190 | 1.235113 |
152-
153-
Here is an example using the script to compare result_a and result_b with detail test name.
144+
Here is an example using the script to compare result_a and result_b with Model, Dataset name, input/output lenght, max concurrency and qps.
154145
`python3 compare-json-results.py -f results_a/benchmark_results.json -f results_b/benchmark_results.json`
155146

156-
| | results_a/benchmark_results.json_name | results_a/benchmark_results.json | results_b/benchmark_results.json_name | results_b/benchmark_results.json | perf_ratio |
157-
|---|---------------------------------------------|----------------------------------------|---------------------------------------------|----------------------------------------|----------|
158-
| 0 | serving_llama8B_tp1_sharegpt_qps_1 | 142.633982 | serving_llama8B_tp1_sharegpt_qps_1 | 156.526018 | 1.097396 |
159-
| 1 | serving_llama8B_tp1_sharegpt_qps_16 | 241.620334 | serving_llama8B_tp1_sharegpt_qps_16 | 294.018783 | 1.216863 |
160-
| 2 | serving_llama8B_tp1_sharegpt_qps_4 | 218.298905 | serving_llama8B_tp1_sharegpt_qps_4 | 262.664916 | 1.203235 |
161-
| 3 | serving_llama8B_tp1_sharegpt_qps_inf | 242.743860 | serving_llama8B_tp1_sharegpt_qps_inf | 299.816190 | 1.235113 |
162-
| 4 | serving_llama8B_tp2_random_1024_128_qps_1 | 96.613390 | serving_llama8B_tp4_random_1024_128_qps_1 | 108.404853 | 1.122048 |
147+
| | Model | Dataset Name | Input Len | Output Len | # of max concurrency | qps | results_a/benchmark_results.json | results_b/benchmark_results.json | perf_ratio |
148+
|----|---------------------------------------|--------|-----|-----|------|-----|-----------|----------|----------|
149+
| 0 | meta-llama/Meta-Llama-3.1-8B-Instruct | random | 128 | 128 | 1000 | 1 | 142.633982 | 156.526018 | 1.097396 |
150+
| 1 | meta-llama/Meta-Llama-3.1-8B-Instruct | random | 128 | 128 | 1000 | inf| 241.620334 | 294.018783 | 1.216863 |
151+
152+
A comparison diagram will be generated below the table.
153+
Here is an example to compare between 96c/results_gnr_96c_091_tp2pp3 and 128c/results_gnr_128c_091_tp2pp3
154+
<img width="1886" height="828" alt="image" src="https://github.com/user-attachments/assets/c02a43ef-25d0-4fd6-90e5-2169a28682dd" />
163155

164156
## Nightly test details
165157

.buildkite/nightly-benchmarks/nightly-descriptions.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ Latest reproduction guilde: [github issue link](https://github.com/vllm-project/
1717
- SGLang: `lmsysorg/sglang:v0.3.2-cu121`
1818
- LMDeploy: `openmmlab/lmdeploy:v0.6.1-cu12`
1919
- TensorRT-LLM: `nvcr.io/nvidia/tritonserver:24.07-trtllm-python-py3`
20-
- *NOTE: we uses r24.07 as the current implementation only works for this version. We are going to bump this up.*
20+
- *NOTE: we use r24.07 as the current implementation only works for this version. We are going to bump this up.*
2121
- Check [nightly-pipeline.yaml](nightly-pipeline.yaml) for the concrete docker images, specs and commands we use for the benchmark.
2222
- Hardware
2323
- 8x Nvidia A100 GPUs

0 commit comments

Comments
 (0)