Skip to content

Commit a5f12c7

Browse files
author
github-actions
committed
Auto-merge updates from master branch
2 parents 70fcbe0 + 088bb82 commit a5f12c7

File tree

3 files changed

+44
-1
lines changed

3 files changed

+44
-1
lines changed

README.md

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,32 @@ Please see the [MLPerf Inference benchmark paper](https://arxiv.org/abs/1911.025
1717

1818
Please see [here](https://docs.mlcommons.org/inference/benchmarks/) for the MLPerf inference documentation website which includes automated commands to run MLPerf inference benchmarks using different implementations.
1919

20+
## MLPerf Inference v5.1 (submission deadline July 25, 2025)
21+
22+
For submissions, please use the master branch and any commit since the [5.1 seed release (soon to be released)]() although it is best to use the latest commit in the [master branch](https://github.com/mlcommons/inference).
23+
24+
For power submissions please use [SPEC PTD 1.11.1](https://github.com/mlcommons/power) (needs special access) and any commit of the power-dev repository after the [code-freeze](https://github.com/mlcommons/power-dev/commit/c4b3ad8202fbd8ac28d77149e5e7aeadb725bbf2)
25+
26+
27+
| model | reference app | framework | dataset | category
28+
| ---- | ---- | ---- | ---- | ---- |
29+
| retinanet 800x800 | [vision/classification_and_detection](https://github.com/mlcommons/inference/tree/master/vision/classification_and_detection) | pytorch, onnx | openimages resized to 800x800| edge,datacenter |
30+
| bert | [language/bert](https://github.com/mlcommons/inference/tree/master/language/bert) | tensorflow, pytorch, onnx | squad-1.1 | edge |
31+
| dlrm-v2 | [recommendation/dlrm_v2](https://github.com/mlcommons/inference/tree/master/recommendation/dlrm_v2/pytorch) | pytorch | Multihot Criteo Terabyte | datacenter |
32+
| 3d-unet | [vision/medical_imaging/3d-unet-kits19](https://github.com/mlcommons/inference/tree/master/vision/medical_imaging/3d-unet-kits19) | pytorch, tensorflow, onnx | KiTS19 | datacenter |
33+
| stable-diffusion-xl | [text_to_image](https://github.com/mlcommons/inference/tree/master/text_to_image) | pytorch | COCO 2014| edge,datacenter |
34+
| llama2-70b | [language/llama2-70b](https://github.com/mlcommons/inference/tree/master/language/llama2-70b) | pytorch | OpenOrca | datacenter |
35+
| llama3.1-405b | [language/llama3-405b](https://github.com/mlcommons/inference/tree/master/language/llama3.1-405b) | pytorch | LongBench, LongDataCollections, Ruler, GovReport | datacenter |
36+
| mixtral-8x7b | [language/mixtral-8x7b](https://github.com/mlcommons/inference/tree/master/language/mixtral-8x7b) | pytorch | OpenOrca, MBXP, GSM8K | datacenter |
37+
| rgat | [graph/rgat](https://github.com/mlcommons/inference/tree/master/graph/R-GAT) | pytorch | IGBH | datacenter |
38+
| pointpainting | [automotive/3d-object-detection](https://github.com/mlcommons/inference/tree/master/automotive/3d-object-detection) | pytorch, onnx | Waymo Open Dataset | edge |
39+
| llama3.1-8b | [language/llama3.1-8b](https://github.com/mlcommons/inference/tree/master/language/llama3.1-8b)| pytorch | CNN-Daily Mail | edge,datacenter |
40+
| deepseek-r1 | [language/deepseek-r1](https://github.com/mlcommons/inference/tree/master/language/deepseek-r1)| pytorch | mlperf_deepseek_r1 | datacenter |
41+
| whisper | [speech2text](https://github.com/mlcommons/inference/tree/master/speech2text)| pytorch | LibriSpeech | edge,datacenter |
42+
43+
* Framework here is given for the reference implementation. Submitters are free to use their own frameworks to run the benchmark.
44+
45+
2046
## MLPerf Inference v5.0 (submission deadline February 28, 2025)
2147

2248
For submissions, please use the master branch and any commit since the [5.0 seed release](https://github.com/mlcommons/inference/commit/5d83ed5de438ffb55bca4cdb2966fba90a9dbca6) although it is best to use the latest commit in the [master branch](https://github.com/mlcommons/inference).

loadgen/VERSION.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
5.0.17
1+
5.0.18

loadgen/mlperf.conf

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,8 @@ llama3_1-405b.*.performance_sample_count_override = 8313
2020
stable-diffusion-xl.*.performance_sample_count_override = 5000
2121
rgat.*.performance_sample_count_override = 788379
2222
pointpainting.*.performance_sample_count_override = 1024
23+
deepseek-r1.*.performance_sample_count_override = 4388
24+
deepseek-r1-interactive.*.performance_sample_count_override = 4388
2325
# set to 0 to let entire sample set to be performance sample
2426
3d-unet.*.performance_sample_count_override = 0
2527

@@ -55,6 +57,8 @@ llama2-70b.*.sample_concatenate_permutation = 1
5557
llama2-70b-interactive.*.sample_concatenate_permutation = 1
5658
mixtral-8x7b.*.sample_concatenate_permutation = 1
5759
llama3_1-405b.*.sample_concatenate_permutation = 1
60+
deepseek-r1.*.sample_concatenate_permutation = 1
61+
deepseek-r1-interactive.*.sample_concatenate_permutation = 1
5862

5963
*.Server.target_latency = 10
6064
*.Server.target_latency_percentile = 99
@@ -73,6 +77,9 @@ llama2-70b.*.use_token_latencies = 1
7377
llama2-70b-interactive.*.use_token_latencies = 1
7478
mixtral-8x7b.*.use_token_latencies = 1
7579
llama3_1-405b.*.use_token_latencies = 1
80+
deepseek-r1.*.use_token_latencies = 1
81+
deepseek-r1-interactive.*.use_token_latencies = 1
82+
7683
# gptj benchmark infers token latencies
7784
gptj.*.infer_token_latencies = 1
7885
gptj.*.token_latency_scaling_factor = 69
@@ -94,6 +101,14 @@ llama3_1-405b.Server.target_latency = 0
94101
llama3_1-405b.Server.ttft_latency = 6000
95102
llama3_1-405b.Server.tpot_latency = 175
96103

104+
deepseek-r1.Server.target_latency = 0
105+
deepseek-r1.Server.ttft_latency = 2000
106+
deepseek-r1.Server.tpot_latency = 80
107+
108+
deepseek-r1-interactive.Server.target_latency = 0
109+
deepseek-r1-interactive.Server.ttft_latency = 1000
110+
deepseek-r1-interactive.Server.tpot_latency = 40
111+
97112
*.Offline.target_latency_percentile = 90
98113
*.Offline.min_duration = 600000
99114

@@ -114,6 +129,8 @@ llama2-70b.Offline.min_query_count = 24576
114129
llama3_1-405b.Offline.min_query_count = 8313
115130
mixtral-8x7b.Offline.min_query_count = 15000
116131
rgat.Offline.min_query_count = 788379
132+
deepseek-r1.Offline.min_query_count = 4388
133+
deepseek-r1-interactive.Offline.min_query_count = 4388
117134

118135
# These fields should be defined and overridden by user.conf.
119136
*.SingleStream.target_latency = 10

0 commit comments

Comments
 (0)