Skip to content

Commit 35d4529

Browse files
authored
Merge branch 'master' into system_check
2 parents 4c6478d + 27db053 commit 35d4529

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

44 files changed

+3472
-1114
lines changed

.github/workflows/build_wheels.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -124,10 +124,10 @@ jobs:
124124
name: wheels-macos-latest
125125
path: wheels
126126
# Download the built wheels from macOS-13 (x86)
127-
- name: Download macOS-13 (x86) wheels
127+
- name: Download macOS-15 (x86) wheels
128128
uses: actions/download-artifact@v4
129129
with:
130-
name: wheels-macos-13
130+
name: wheels-macos-15
131131
path: wheels
132132
# Download the built wheels from Windows
133133
- name: Download Windows wheels

.gitmodules

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,3 +10,6 @@
1010
[submodule "language/deepseek-r1/submodules/LiveCodeBench"]
1111
path = language/deepseek-r1/submodules/LiveCodeBench
1212
url = https://github.com/LiveCodeBench/LiveCodeBench
13+
[submodule "text_to_video/wan2.2-t2v-14b/submodules/VBench"]
14+
path = text_to_video/wan2.2-t2v-14b/submodules/VBench
15+
url = https://github.com/Vchitect/VBench

docs/submission/index.md

Lines changed: 233 additions & 218 deletions
Large diffs are not rendered by default.

docs/submission/submission-cli.md

Lines changed: 231 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,231 @@
1+
---
2+
hide:
3+
- toc
4+
---
5+
6+
Click [here](https://docs.google.com/presentation/d/1cmbpZUpVr78EIrhzyMBnnWnjJrD-mZ2vmSb-yETkTA8/edit?usp=sharing) to view the proposal slide for Common Automation for MLPerf Inference Submission Generation through MLCFlow.
7+
8+
Please refer to the [installation page](site:inference/install/) to install MLCFlow for automating the submission generation. In a typical development environment `pip install mlc-scripts` should be enough.
9+
10+
=== "Custom automation based MLPerf results"
11+
If you have not followed the `mlcr` commands under the individual model pages in the [benchmarks](../index.md) directory, please make sure that the result directory is structured in the following way. You can see the real examples for the expected folder structure [here](https://github.com/mlcommons/inference/tree/submission-generation-examples).
12+
```
13+
└── System description ID(SUT Name)
14+
├── system_meta.json
15+
└── Benchmark
16+
└── Scenario
17+
├── Performance
18+
| └── run_1 run for all scenarios
19+
| ├── mlperf_log_summary.txt
20+
| └── mlperf_log_detail.txt
21+
├── Accuracy
22+
| ├── mlperf_log_summary.txt
23+
| ├── mlperf_log_detail.txt
24+
| ├── mlperf_log_accuracy.json
25+
| └── accuracy.txt
26+
|── Compliance_Test_ID
27+
| ├── Performance
28+
| | └── run_x/#1 run for all scenarios
29+
| | ├── mlperf_log_summary.txt
30+
| | └── mlperf_log_detail.txt
31+
| ├── Accuracy # for TEST01 only
32+
| | ├── baseline_accuracy.txt (if test fails in deterministic mode)
33+
| | ├── compliance_accuracy.txt (if test fails in deterministic mode)
34+
| | ├── mlperf_log_accuracy.json
35+
| | └── accuracy.txt
36+
| ├── verify_performance.txt
37+
| └── verify_accuracy.txt # for TEST01 only
38+
|── user.conf
39+
└── measurements.json
40+
```
41+
42+
<details>
43+
<summary>Click here if you are submitting in open division</summary>
44+
45+
* The `model_mapping.json` should be included inside the SUT folder which is used to map the custom model full name to the official model name. The format of json file is:
46+
47+
```
48+
{
49+
"custom_model_name_for_model1":"official_model_name_for_model1",
50+
"custom_model_name_for_model2":"official_model_name_for_model2",
51+
52+
}
53+
```
54+
</details>
55+
56+
=== "MLC automation based results"
57+
If you have followed the `mlcr` commands under the individual model pages in the [benchmarks](../index.md) directory, all the valid results will get aggregated to the `mlc cache` folder. The following command could be used to browse the structure of inference results folder generated by MLCFlow.
58+
### Get results folder structure
59+
60+
=== "Unix Terminal"
61+
```bash
62+
mlc find cache --tags=get,mlperf,inference,results,dir | xargs tree
63+
```
64+
=== "Windows PowerShell"
65+
```
66+
mlc find cache --tags=get,mlperf,inference,results,dir | ForEach-Object { Get-ChildItem -Recurse $_ }
67+
```
68+
69+
70+
Once all the results across all the models are ready you can use the following the below section to generate a valid submission tree compliant with the [MLPerf requirements](https://github.com/mlcommons/policies/blob/master/submission_rules.adoc#inference-1).
71+
72+
## Generate submission folder
73+
74+
The submission generation flow is explained in the below diagram
75+
76+
```mermaid
77+
flowchart LR
78+
subgraph Generation [Submission Generation SUT1]
79+
direction TB
80+
A[populate system details] --> B[generate submission structure]
81+
B --> C[truncate-accuracy-logs]
82+
C --> D{Infer low latency results <br>and/or<br> filter out invalid results}
83+
D --> yes --> E[preprocess-mlperf-inference-submission]
84+
D --> no --> F[run-mlperf-inference-submission-checker]
85+
E --> F
86+
end
87+
Input((Results SUT1)) --> Generation
88+
Generation --> Output((Submission Folder <br> SUT1))
89+
```
90+
91+
### Command to generate submission folder
92+
93+
```bash
94+
mlcr generate,inference,submission \
95+
--clean \
96+
--preprocess_submission=yes \
97+
--run_checker=yes \
98+
--submitter=MLCommons \
99+
--division=closed \
100+
--env.MLC_DETERMINE_MEMORY_CONFIGURATION=yes \
101+
--quiet
102+
```
103+
!!! tip
104+
* Use `--hw_name="My system name"` to give a meaningful system name. Examples can be seen [here](https://github.com/mlcommons/inference_results_v3.0/tree/main/open/cTuning/systems)
105+
106+
* Use `--submitter=<Your name>` if your organization is an official MLCommons member and would like to submit under your organization
107+
108+
* Use `--hw_notes_extra` option to add additional notes like `--hw_notes_extra="Result taken by NAME" `
109+
110+
* Use `--results_dir` option to specify the results folder. It is automatically taken from MLC cache for MLPerf automation based runs
111+
112+
* Use `--submission_dir` option to specify the submission folder. (You can avoid this if you're pushing to github or only running a single SUT and MLC will use its cache folder)
113+
114+
* Use `--division=open` for open division submission
115+
116+
* Use `--category` option to specify the category for which submission is generated(datacenter/edge). By default, the category is taken from `system_meta.json` file located in the SUT root directory.
117+
118+
* Use `--submission_base_dir` to specify the directory to which the outputs from preprocess submission script and final submission is added. No need to provide `--submission_dir` along with this. For `docker run`, use `--submission_base_dir` instead of `--submission_dir`.
119+
120+
121+
If there are multiple systems where MLPerf results are collected, the same process needs to be repeated on each of them. One we have submission folders on all the SUTs, we need to sync them to make a single submission folder
122+
123+
=== "Sync Locally"
124+
If you are having results in multiple systems, you need to merge them to one system. You can use `rsync` for this. For example, the below command will sync the submission folder from SUT2 to the one in SUT1.
125+
```
126+
rsync -avz username@host1:<path_to_submission_folder2>/ <path_to_submission_folder1>/
127+
```
128+
Same needs to be repeated for all other SUTs so that we have the full submissions in SUT1.
129+
130+
```mermaid
131+
flowchart LR
132+
subgraph SUT1 [Submission Generation SUT1]
133+
A[Submission Folder SUT1]
134+
end
135+
subgraph SUT2 [Submission Generation SUT2]
136+
B[Submission Folder SUT2]
137+
end
138+
subgraph SUT3 [Submission Generation SUT3]
139+
C[Submission Folder SUT3]
140+
end
141+
subgraph SUTN [Submission Generation SUTN]
142+
D[Submission Folder SUTN]
143+
end
144+
SUT2 --> SUT1
145+
SUT3 --> SUT1
146+
SUTN --> SUT1
147+
148+
```
149+
150+
=== "Sync via a Github repo"
151+
If you are collecting results across multiple systems you can generate different submissions and aggregate all of them to a GitHub repository (can be private) and use it to generate a single tar ball which can be uploaded to the [MLCommons Submission UI](https://submissions-ui.mlcommons.org/submission).
152+
153+
Run the following command after **replacing `--repo_url` with your GitHub repository URL**.
154+
155+
```bash
156+
mlcr push,github,mlperf,inference,submission \
157+
--repo_url=https://github.com/mlcommons/mlperf_inference_submissions_v5.0 \
158+
--commit_message="Results on <HW name> added by <Name>" \
159+
--quiet
160+
```
161+
162+
> **Note:** The path to the locally synced submission directory from the output below can be used in the next step by passing it to the `--submission_dir` argument.
163+
<details>
164+
<summary>Click to see the sample output</summary>
165+
```
166+
[2025-07-23 16:36:56,399 module.py:2197 INFO] -
167+
168+
Path to the locally synced submission directory: mysubmissions/mlperf_submission
169+
170+
171+
```
172+
</details>
173+
174+
```mermaid
175+
flowchart LR
176+
subgraph SUT1 [Submission Generation SUT1]
177+
A[Submission Folder SUT1]
178+
end
179+
subgraph SUT2 [Submission Generation SUT2]
180+
B[Submission Folder SUT2]
181+
end
182+
subgraph SUT3 [Submission Generation SUT3]
183+
C[Submission Folder SUT3]
184+
end
185+
subgraph SUTN [Submission Generation SUTN]
186+
D[Submission Folder SUTN]
187+
end
188+
SUT2 -- git sync and push --> G[Github Repo]
189+
SUT3 -- git sync and push --> G[Github Repo]
190+
SUTN -- git sync and push --> G[Github Repo]
191+
SUT1 -- git sync and push --> G[Github Repo]
192+
193+
```
194+
195+
## Upload the final submission
196+
197+
!!! warning
198+
If you are using GitHub for consolidating your results, make sure that you have run the [`push-to-github` command](#__tabbed_2_2) on the same system to ensure results are synced as is on the GitHub repository.
199+
200+
Once you have all the results on the system, you can upload them to the MLCommons submission server as follows:
201+
202+
=== "via CLI"
203+
You can do the following command which will run the submission checker and upload the results to the MLCommons submission server
204+
```
205+
mlcr run,mlperf,submission,checker,inference \
206+
--submitter_id=<> \
207+
--submission_dir=<Path to the locally synced submission directory> --quiet
208+
```
209+
210+
=== "via Browser"
211+
You can do the following command to generate the final submission tar file and then upload to the [MLCommons Submission UI](https://submissions-ui.mlcommons.org/submission).
212+
```
213+
mlcr run,mlperf,submission,checker,inference \
214+
--submission_dir=<Path to the submission folder> \
215+
--tar=yes \
216+
--submission_tar_file=mysubmission.tar.gz --quiet
217+
```
218+
219+
```mermaid
220+
flowchart LR
221+
subgraph SUT [Combined Submissions]
222+
A[Combined Submission Folder in SUT1]
223+
end
224+
SUT --> B[Run submission checker]
225+
B --> C[Upload to MLC Submission server]
226+
C --> D[Receive validation email]
227+
```
228+
229+
230+
231+
<!--Click [here](https://youtu.be/eI1Hoecc3ho) to view the recording of the workshop: Streamlining your MLPerf Inference results using CM.-->

loadgen/README_BUILD.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@
1212
pip install absl-py numpy
1313
git clone --recurse-submodules https://github.com/mlcommons/inference.git mlperf_inference
1414
cd mlperf_inference/loadgen
15-
CFLAGS="-std=c++14 -O3" python -m pip install .
15+
python -m pip install .
1616

1717
This will fetch the loadgen source, build and install the loadgen as a python module, and run a simple end-to-end demo.
1818

loadgen/VERSION.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
6.0.1
1+
6.0.3

loadgen/bindings/python_api.cc

Lines changed: 10 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -312,6 +312,8 @@ PYBIND11_MODULE(mlperf_loadgen, m) {
312312
&TestSettings::server_num_issue_query_threads)
313313
.def_readwrite("offline_expected_qps",
314314
&TestSettings::offline_expected_qps)
315+
.def_readwrite("sample_concatenate_permutation",
316+
&TestSettings::sample_concatenate_permutation)
315317
.def_readwrite("min_duration_ms", &TestSettings::min_duration_ms)
316318
.def_readwrite("max_duration_ms", &TestSettings::max_duration_ms)
317319
.def_readwrite("min_query_count", &TestSettings::min_query_count)
@@ -324,6 +326,14 @@ PYBIND11_MODULE(mlperf_loadgen, m) {
324326
&TestSettings::accuracy_log_rng_seed)
325327
.def_readwrite("accuracy_log_probability",
326328
&TestSettings::accuracy_log_probability)
329+
.def_readwrite("accuracy_log_sampling_target",
330+
&TestSettings::accuracy_log_sampling_target)
331+
.def_readwrite("test05", &TestSettings::test05)
332+
.def_readwrite("test05_qsl_rng_seed", &TestSettings::test05_qsl_rng_seed)
333+
.def_readwrite("test05_sample_index_rng_seed",
334+
&TestSettings::test05_sample_index_rng_seed)
335+
.def_readwrite("test05_schedule_rng_seed",
336+
&TestSettings::test05_schedule_rng_seed)
327337
.def_readwrite("print_timestamps", &TestSettings::print_timestamps)
328338
.def_readwrite("performance_issue_unique",
329339
&TestSettings::performance_issue_unique)
@@ -333,12 +343,6 @@ PYBIND11_MODULE(mlperf_loadgen, m) {
333343
&TestSettings::performance_issue_same_index)
334344
.def_readwrite("performance_sample_count_override",
335345
&TestSettings::performance_sample_count_override)
336-
.def_readwrite("test05", &TestSettings::test05)
337-
.def_readwrite("test05_qsl_rng_seed", &TestSettings::test05_qsl_rng_seed)
338-
.def_readwrite("test05_sample_index_rng_seed",
339-
&TestSettings::test05_sample_index_rng_seed)
340-
.def_readwrite("test05_schedule_rng_seed",
341-
&TestSettings::test05_schedule_rng_seed)
342346
.def_readwrite("use_token_latencies", &TestSettings::use_token_latencies)
343347
.def_readwrite("ttft_latency", &TestSettings::server_ttft_latency)
344348
.def_readwrite("tpot_latency", &TestSettings::server_tpot_latency)

loadgen/mlperf.conf

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,7 @@ pointpainting.*.performance_sample_count_override = 1024
2727
deepseek-r1.*.performance_sample_count_override = 4388
2828
deepseek-r1-interactive.*.performance_sample_count_override = 4388
2929
whisper.*.performance_sample_count_override = 1633
30+
qwen3-vl-235b-a22b.*.performance_sample_count_override = 48289
3031
# set to 0 to let entire sample set to be performance sample
3132
3d-unet.*.performance_sample_count_override = 0
3233

@@ -69,7 +70,7 @@ llama3_1-8b-interactive.*.sample_concatenate_permutation = 1
6970
deepseek-r1.*.sample_concatenate_permutation = 1
7071
deepseek-r1-interactive.*.sample_concatenate_permutation = 1
7172
whisper.*.sample_concatenate_permutation = 1
72-
73+
qwen3-vl-235b-a22b.*.sample_concatenate_permutation = 1
7374
*.Server.target_latency = 10
7475
*.Server.target_latency_percentile = 99
7576
*.Server.target_duration = 0
@@ -94,7 +95,9 @@ llama3_1-8b-interactive.*.use_token_latencies = 1
9495
deepseek-r1.*.use_token_latencies = 1
9596
deepseek-r1-interactive.*.use_token_latencies = 1
9697
whisper.*.use_token_latencies = 1
97-
98+
# For the VLM benchmark, the model response is relatively short, therefore we track
99+
# end-to-end latency instead of token latencies.
100+
qwen3-vl-235b-a22b.*.use_token_latencies = 0
98101
# gptj benchmark infers token latencies
99102
gptj.*.infer_token_latencies = 1
100103
gptj.*.token_latency_scaling_factor = 69
@@ -140,6 +143,8 @@ deepseek-r1-interactive.Server.target_latency = 0
140143
deepseek-r1-interactive.Server.ttft_latency = 1500
141144
deepseek-r1-interactive.Server.tpot_latency = 15
142145

146+
qwen3-vl-235b-a22b.Server.target_latency = 12000
147+
143148
*.Offline.target_latency_percentile = 90
144149
*.Offline.min_duration = 600000
145150

@@ -164,6 +169,7 @@ mixtral-8x7b.Offline.min_query_count = 15000
164169
rgat.Offline.min_query_count = 788379
165170
deepseek-r1.Offline.min_query_count = 4388
166171
whisper.Offline.min_query_count = 1633
172+
qwen3-vl-235b-a22b.Offline.min_query_count = 48289
167173

168174
# These fields should be defined and overridden by user.conf.
169175
*.SingleStream.target_latency = 10

0 commit comments

Comments
 (0)