Skip to content

Commit e11fe22

Browse files
Merge branch 'main' into conditional-compile
2 parents bd849d6 + 6b04039 commit e11fe22

File tree

183 files changed

+5517
-2748
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

183 files changed

+5517
-2748
lines changed

.buildkite/nightly-benchmarks/README.md

Lines changed: 12 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ This directory contains two sets of benchmark for vllm.
77
- Performance benchmark: benchmark vllm's performance under various workload, for **developers** to gain clarity on whether their PR improves/degrades vllm's performance
88
- Nightly benchmark: compare vllm's performance against alternatives (tgi, trt-llm and lmdeploy), for **the public** to know when to choose vllm.
99

10-
See [vLLM performance dashboard](https://perf.vllm.ai) for the latest performance benchmark results and [vLLM GitHub README](https://github.com/vllm-project/vllm/blob/main/README.md) for latest nightly benchmark results.
10+
See [vLLM performance dashboard](https://hud.pytorch.org/benchmark/llms?repoName=vllm-project%2Fvllm) for the latest performance benchmark results and [vLLM GitHub README](https://github.com/vllm-project/vllm/blob/main/README.md) for latest nightly benchmark results.
1111

1212
## Performance benchmark quick overview
1313

@@ -138,28 +138,20 @@ The raw benchmarking results (in the format of json files) are in the `Artifacts
138138

139139
The `compare-json-results.py` helps to compare benchmark results JSON files converted using `convert-results-json-to-markdown.py`.
140140
When run, benchmark script generates results under `benchmark/results` folder, along with the `benchmark_results.md` and `benchmark_results.json`.
141-
`compare-json-results.py` compares two `benchmark_results.json` files and provides performance ratio e.g. for Output Tput, Median TTFT and Median TPOT.
141+
`compare-json-results.py` compares two `benchmark_results.json` files and provides performance ratio e.g. for Output Tput, Median TTFT and Median TPOT.
142+
If only one benchmark_results.json is passed, `compare-json-results.py` compares different TP and PP configurations in the benchmark_results.json instead.
142143

143-
Here is an example using the script to compare result_a and result_b without detail test name.
144-
`python3 compare-json-results.py -f results_a/benchmark_results.json -f results_b/benchmark_results.json --ignore_test_name`
145-
146-
| | results_a/benchmark_results.json | results_b/benchmark_results.json | perf_ratio |
147-
|----|----------------------------------------|----------------------------------------|----------|
148-
| 0 | 142.633982 | 156.526018 | 1.097396 |
149-
| 1 | 241.620334 | 294.018783 | 1.216863 |
150-
| 2 | 218.298905 | 262.664916 | 1.203235 |
151-
| 3 | 242.743860 | 299.816190 | 1.235113 |
152-
153-
Here is an example using the script to compare result_a and result_b with detail test name.
144+
Here is an example using the script to compare result_a and result_b with Model, Dataset name, input/output lenght, max concurrency and qps.
154145
`python3 compare-json-results.py -f results_a/benchmark_results.json -f results_b/benchmark_results.json`
155146

156-
| | results_a/benchmark_results.json_name | results_a/benchmark_results.json | results_b/benchmark_results.json_name | results_b/benchmark_results.json | perf_ratio |
157-
|---|---------------------------------------------|----------------------------------------|---------------------------------------------|----------------------------------------|----------|
158-
| 0 | serving_llama8B_tp1_sharegpt_qps_1 | 142.633982 | serving_llama8B_tp1_sharegpt_qps_1 | 156.526018 | 1.097396 |
159-
| 1 | serving_llama8B_tp1_sharegpt_qps_16 | 241.620334 | serving_llama8B_tp1_sharegpt_qps_16 | 294.018783 | 1.216863 |
160-
| 2 | serving_llama8B_tp1_sharegpt_qps_4 | 218.298905 | serving_llama8B_tp1_sharegpt_qps_4 | 262.664916 | 1.203235 |
161-
| 3 | serving_llama8B_tp1_sharegpt_qps_inf | 242.743860 | serving_llama8B_tp1_sharegpt_qps_inf | 299.816190 | 1.235113 |
162-
| 4 | serving_llama8B_tp2_random_1024_128_qps_1 | 96.613390 | serving_llama8B_tp4_random_1024_128_qps_1 | 108.404853 | 1.122048 |
147+
| | Model | Dataset Name | Input Len | Output Len | # of max concurrency | qps | results_a/benchmark_results.json | results_b/benchmark_results.json | perf_ratio |
148+
|----|---------------------------------------|--------|-----|-----|------|-----|-----------|----------|----------|
149+
| 0 | meta-llama/Meta-Llama-3.1-8B-Instruct | random | 128 | 128 | 1000 | 1 | 142.633982 | 156.526018 | 1.097396 |
150+
| 1 | meta-llama/Meta-Llama-3.1-8B-Instruct | random | 128 | 128 | 1000 | inf| 241.620334 | 294.018783 | 1.216863 |
151+
152+
A comparison diagram will be generated below the table.
153+
Here is an example to compare between 96c/results_gnr_96c_091_tp2pp3 and 128c/results_gnr_128c_091_tp2pp3
154+
<img width="1886" height="828" alt="image" src="https://github.com/user-attachments/assets/c02a43ef-25d0-4fd6-90e5-2169a28682dd" />
163155

164156
## Nightly test details
165157

Lines changed: 162 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,24 +1,38 @@
11
# SPDX-License-Identifier: Apache-2.0
22
# SPDX-FileCopyrightText: Copyright contributors to the vLLM project
33
import argparse
4+
import json
5+
import os
46

57
import pandas as pd
68

79

810
def compare_data_columns(
9-
files, name_column, data_column, drop_column, ignore_test_name=False
11+
files, name_column, data_column, info_cols, drop_column, debug=False
1012
):
1113
print("\ncompare_data_column: " + data_column)
1214
frames = []
15+
raw_data_cols = []
1316
compare_frames = []
1417
for file in files:
1518
data_df = pd.read_json(file)
1619
serving_df = data_df.dropna(subset=[drop_column], ignore_index=True)
17-
if ignore_test_name is False:
20+
# Show all info columns in the first couple columns
21+
if not frames:
22+
for col in info_cols:
23+
if col not in serving_df.columns:
24+
print(f"Skipping missing column: {col}")
25+
continue
26+
frames.append(serving_df[col])
27+
# only show test name under debug mode
28+
if debug is True:
1829
serving_df = serving_df.rename(columns={name_column: file + "_name"})
1930
frames.append(serving_df[file + "_name"])
31+
32+
file = "/".join(file.split("/")[:-1])
2033
serving_df = serving_df.rename(columns={data_column: file})
2134
frames.append(serving_df[file])
35+
raw_data_cols.append(file)
2236
compare_frames.append(serving_df[file])
2337
if len(compare_frames) >= 2:
2438
# Compare numbers among two files
@@ -27,7 +41,68 @@ def compare_data_columns(
2741
compare_frames.pop(1)
2842

2943
concat_df = pd.concat(frames, axis=1)
30-
return concat_df
44+
print(raw_data_cols)
45+
return concat_df, raw_data_cols
46+
47+
48+
def split_json_by_tp_pp(
49+
input_file: str = "benchmark_results.json", output_root: str = "."
50+
) -> list[str]:
51+
"""
52+
Split a benchmark JSON into separate folders by (TP Size, PP Size).
53+
54+
Creates: <output_root>/tp{TP}_pp{PP}/benchmark_results.json
55+
Returns: list of file paths written.
56+
"""
57+
# Load JSON data into DataFrame
58+
with open(input_file, encoding="utf-8") as f:
59+
data = json.load(f)
60+
61+
# If the JSON is a dict with a list under common keys, use that list
62+
if isinstance(data, dict):
63+
for key in ("results", "serving_results", "benchmarks", "data"):
64+
if isinstance(data.get(key), list):
65+
data = data[key]
66+
break
67+
68+
df = pd.DataFrame(data)
69+
70+
# Handle alias column names
71+
rename_map = {
72+
"tp_size": "TP Size",
73+
"tensor_parallel_size": "TP Size",
74+
"pp_size": "PP Size",
75+
"pipeline_parallel_size": "PP Size",
76+
}
77+
df.rename(
78+
columns={k: v for k, v in rename_map.items() if k in df.columns}, inplace=True
79+
)
80+
81+
# Ensure TP/PP columns exist (default to 1 if missing)
82+
if "TP Size" not in df.columns:
83+
df["TP Size"] = 1
84+
if "PP Size" not in df.columns:
85+
df["PP Size"] = 1
86+
87+
# make sure TP/PP are numeric ints with no NaN
88+
df["TP Size"] = (
89+
pd.to_numeric(df.get("TP Size", 1), errors="coerce").fillna(1).astype(int)
90+
)
91+
df["PP Size"] = (
92+
pd.to_numeric(df.get("PP Size", 1), errors="coerce").fillna(1).astype(int)
93+
)
94+
95+
# Split into separate folders
96+
saved_paths: list[str] = []
97+
for (tp, pp), group_df in df.groupby(["TP Size", "PP Size"], dropna=False):
98+
folder_name = os.path.join(output_root, f"tp{int(tp)}_pp{int(pp)}")
99+
os.makedirs(folder_name, exist_ok=True)
100+
filepath = os.path.join(folder_name, "benchmark_results.json")
101+
group_df.to_json(filepath, orient="records", indent=2, force_ascii=False)
102+
print(f"Saved: {filepath}")
103+
saved_paths.append(filepath)
104+
105+
return saved_paths
31106

32107

33108
if __name__ == "__main__":
@@ -36,31 +111,105 @@ def compare_data_columns(
36111
"-f", "--file", action="append", type=str, help="input file name"
37112
)
38113
parser.add_argument(
39-
"--ignore_test_name", action="store_true", help="ignore_test_name or not"
114+
"--debug", action="store_true", help="show all information for debugging"
115+
)
116+
parser.add_argument(
117+
"--plot",
118+
action=argparse.BooleanOptionalAction,
119+
default=True,
120+
help="plot perf diagrams or not --no-plot --plot",
121+
)
122+
parser.add_argument(
123+
"-x",
124+
"--xaxis",
125+
type=str,
126+
default="# of max concurrency.",
127+
help="column name to use as X Axis in comparision graph",
40128
)
41129
args = parser.parse_args()
42-
files = args.file
43-
print("comparing : " + ", ".join(files))
44130

45131
drop_column = "P99"
46132
name_column = "Test name"
133+
info_cols = [
134+
"Model",
135+
"Dataset Name",
136+
"Input Len",
137+
"Output Len",
138+
"TP Size",
139+
"PP Size",
140+
"# of max concurrency.",
141+
"qps",
142+
]
47143
data_cols_to_compare = ["Output Tput (tok/s)", "Median TTFT (ms)", "Median"]
48144
html_msgs_for_data_cols = [
49145
"Compare Output Tokens /n",
50146
"Median TTFT /n",
51147
"Median TPOT /n",
52148
]
53-
ignore_test_name = args.ignore_test_name
149+
150+
if len(args.file) == 1:
151+
files = split_json_by_tp_pp(args.file[0], output_root="splits")
152+
info_cols = [c for c in info_cols if c not in ("TP Size", "PP Size")]
153+
else:
154+
files = args.file
155+
print("comparing : " + ", ".join(files))
156+
debug = args.debug
157+
plot = args.plot
158+
# For Plot feature, assign y axis from one of info_cols
159+
y_axis_index = info_cols.index(args.xaxis) if args.xaxis in info_cols else 6
54160
with open("perf_comparison.html", "w") as text_file:
55161
for i in range(len(data_cols_to_compare)):
56-
output_df = compare_data_columns(
162+
output_df, raw_data_cols = compare_data_columns(
57163
files,
58164
name_column,
59165
data_cols_to_compare[i],
166+
info_cols,
60167
drop_column,
61-
ignore_test_name=ignore_test_name,
168+
debug=debug,
62169
)
63-
print(output_df)
64-
html = output_df.to_html()
65-
text_file.write(html_msgs_for_data_cols[i])
66-
text_file.write(html)
170+
171+
# For Plot feature, insert y axis from one of info_cols
172+
raw_data_cols.insert(0, info_cols[y_axis_index])
173+
174+
filtered_info_cols = info_cols[:-2]
175+
existing_group_cols = [
176+
c for c in filtered_info_cols if c in output_df.columns
177+
]
178+
if not existing_group_cols:
179+
raise ValueError(
180+
f"No valid group-by columns "
181+
f"Expected subset: {filtered_info_cols}, "
182+
f"but DataFrame has: {list(output_df.columns)}"
183+
)
184+
185+
output_df_sorted = output_df.sort_values(by=existing_group_cols)
186+
output_groups = output_df_sorted.groupby(existing_group_cols, dropna=False)
187+
for name, group in output_groups:
188+
html = group.to_html()
189+
text_file.write(html_msgs_for_data_cols[i])
190+
text_file.write(html)
191+
192+
if plot is True:
193+
import pandas as pd
194+
import plotly.express as px
195+
196+
df = group[raw_data_cols]
197+
df_sorted = df.sort_values(by=info_cols[y_axis_index])
198+
# Melt DataFrame for plotting
199+
df_melted = df_sorted.melt(
200+
id_vars=info_cols[y_axis_index],
201+
var_name="Configuration",
202+
value_name=data_cols_to_compare[i],
203+
)
204+
title = data_cols_to_compare[i] + " vs " + info_cols[y_axis_index]
205+
# Create Plotly line chart
206+
fig = px.line(
207+
df_melted,
208+
x=info_cols[y_axis_index],
209+
y=data_cols_to_compare[i],
210+
color="Configuration",
211+
title=title,
212+
markers=True,
213+
)
214+
# Export to HTML
215+
text_file.write(fig.to_html(full_html=True, include_plotlyjs="cdn"))

0 commit comments

Comments
 (0)