Skip to content

Commit 5bc70eb

Browse files
author
github-actions
committed
Auto-merge updates from master branch
2 parents b3e548e + 9374959 commit 5bc70eb

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

54 files changed

+21994
-1
lines changed

loadgen/VERSION.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
6.0.3
1+
6.0.4

loadgen/mlperf.conf

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -80,6 +80,7 @@ retinanet.Server.target_latency = 100
8080
bert.Server.target_latency = 130
8181
dlrm.Server.target_latency = 60
8282
dlrm-v2.Server.target_latency = 60
83+
dlrm-v3.Server.target_latency = 80
8384
rnnt.Server.target_latency = 1000
8485
gptj.Server.target_latency = 20000
8586
stable-diffusion-xl.Server.target_latency = 20000

recommendation/dlrm_v3/README.md

Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
# MLPerf Inference reference implementation for DLRMv3
2+
3+
## Install dependencies and build loadgen
4+
5+
The reference implementation has been tested on a single host, with x86_64 CPUs and 8 NVIDIA H100/B200 GPUs. Dependencies can be installed below,
6+
```
7+
sh setup.sh
8+
```
9+
10+
## Dataset download
11+
12+
DLRMv3 uses a synthetic dataset specifically designed to match the model and system characteristics of large-scale sequential recommendation (large item set and long average sequence length for each request). To generate the dataset used for both training and inference, run
13+
```
14+
python streaming_synthetic_data.py
15+
```
16+
The generated dataset has 2TB size, and contains 5 million users interacting with a billion items over 100 timestamps.
17+
18+
Only 1% of the dataset is used in the inference benchmark. The sampled DLRMv3 dataset and trained checkpoint are available at https://inference.mlcommons-storage.org/.
19+
20+
Script to download the sampled dataset used in inference benchmark:
21+
```
22+
bash <(curl -s https://raw.githubusercontent.com/mlcommons/r2-downloader/refs/heads/main/mlc-r2-downloader.sh) https://inference.mlcommons-storage.org/metadata/dlrm-v3-dataset.uri
23+
```
24+
Script to download the 1TB trained checkpoint:
25+
```
26+
bash <(curl -s https://raw.githubusercontent.com/mlcommons/r2-downloader/refs/heads/main/mlc-r2-downloader.sh) https://inference.mlcommons-storage.org/metadata/dlrm-v3-checkpoint.uri
27+
```
28+
29+
## Inference benchmark
30+
31+
```
32+
WORLD_SIZE=8 python main.py --dataset sampled-streaming-100b
33+
```
34+
35+
`WORLD_SIZE` is the number of GPUs used in the inference benchmark.
36+
37+
```
38+
usage: main.py [-h] [--dataset {streaming-100b,sampled-streaming-100b}] [--model-path MODEL_PATH] [--scenario-name {Server,Offline}] [--batchsize BATCHSIZE]
39+
[--output-trace OUTPUT_TRACE] [--data-producer-threads DATA_PRODUCER_THREADS] [--compute-eval COMPUTE_EVAL] [--find-peak-performance FIND_PEAK_PERFORMANCE]
40+
[--dataset-path-prefix DATASET_PATH_PREFIX] [--warmup-ratio WARMUP_RATIO] [--num-queries NUM_QUERIES] [--target-qps TARGET_QPS] [--numpy-rand-seed NUMPY_RAND_SEED]
41+
[--sparse-quant SPARSE_QUANT] [--dataset-percentage DATASET_PERCENTAGE]
42+
43+
options:
44+
-h, --help show this help message and exit
45+
--dataset {streaming-100b,sampled-streaming-100b}
46+
name of the dataset
47+
--model-path MODEL_PATH
48+
path to the model checkpoint. Example: /home/username/ckpts/streaming_100b/89/
49+
--scenario-name {Server,Offline}
50+
inference benchmark scenario
51+
--batchsize BATCHSIZE
52+
batch size used in the benchmark
53+
--output-trace OUTPUT_TRACE
54+
Whether to output trace
55+
--data-producer-threads DATA_PRODUCER_THREADS
56+
Number of threads used in data producer
57+
--compute-eval COMPUTE_EVAL
58+
If true, will run AccuracyOnly mode and outputs both predictions and labels for accuracy calcuations
59+
--find-peak-performance FIND_PEAK_PERFORMANCE
60+
Whether to find peak performance in the benchmark
61+
--dataset-path-prefix DATASET_PATH_PREFIX
62+
Prefix to the dataset path. Example: /home/username/
63+
--warmup-ratio WARMUP_RATIO
64+
The ratio of the dataset used to warmup SUT
65+
--num-queries NUM_QUERIES
66+
Number of queries to run in the benchmark
67+
--target-qps TARGET_QPS
68+
Benchmark target QPS. Needs to be tuned for different implementations to balance latency and throughput
69+
--numpy-rand-seed NUMPY_RAND_SEED
70+
Numpy random seed
71+
--sparse-quant SPARSE_QUANT
72+
Whether to quantize sparse arch
73+
--dataset-percentage DATASET_PERCENTAGE
74+
Percentage of the dataset to run in the benchmark
75+
```
76+
77+
## Accuracy test
78+
79+
Set `run.compute_eval` will run the accuracy test and dump prediction outputs in
80+
`mlperf_log_accuracy.json`. To check the accuracy, run
81+
82+
```
83+
python accuracy.py --path path/to/mlperf_log_accuracy.json
84+
```
85+
We use normalized entropy (NE), accuracy, and AUC as the metrics to evaluate the model quality. For accepted submissions, all three metrics (NE, Accuracy, AUC) must be within 99% of the reference implementation values. The accuracy for the reference implementation evaluated on 34,996 requests across 10 inference timestamps are listed below:
86+
```
87+
NE: 86.687%
88+
Accuracy: 69.651%
89+
AUC: 78.663%
90+
```

recommendation/dlrm_v3/accuracy.py

Lines changed: 86 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,86 @@
1+
# Copyright (c) Meta Platforms, Inc. and affiliates.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
# pyre-strict
16+
"""
17+
Tool to calculate accuracy for loadgen accuracy output found in mlperf_log_accuracy.json
18+
"""
19+
20+
import argparse
21+
import json
22+
import logging
23+
24+
import numpy as np
25+
import torch
26+
from configs import get_hstu_configs
27+
from utils import MetricsLogger
28+
29+
logger: logging.Logger = logging.getLogger("main")
30+
31+
32+
def get_args() -> argparse.Namespace:
33+
"""Parse commandline."""
34+
parser = argparse.ArgumentParser()
35+
parser.add_argument(
36+
"--path",
37+
required=True,
38+
help="path to mlperf_log_accuracy.json",
39+
)
40+
args = parser.parse_args()
41+
return args
42+
43+
44+
def main() -> None:
45+
"""
46+
Main function to calculate accuracy metrics from loadgen output.
47+
48+
Reads the mlperf_log_accuracy.json file, parses the results, and computes
49+
accuracy metrics using the MetricsLogger. Each result entry contains
50+
predictions, labels, and weights packed as float32 numpy arrays.
51+
"""
52+
args = get_args()
53+
logger.warning("Parsing loadgen accuracy log...")
54+
with open(args.path, "r") as f:
55+
results = json.load(f)
56+
hstu_config = get_hstu_configs(dataset="sampled-streaming-100b")
57+
metrics = MetricsLogger(
58+
multitask_configs=hstu_config.multitask_configs,
59+
batch_size=1,
60+
window_size=3000,
61+
device=torch.device("cpu"),
62+
rank=0,
63+
)
64+
logger.warning(f"results have {len(results)} entries")
65+
for result in results:
66+
data = np.frombuffer(bytes.fromhex(result["data"]), np.float32)
67+
num_candidates = data[-1].astype(int)
68+
assert len(data) == 1 + num_candidates * 3
69+
mt_target_preds = torch.from_numpy(data[0:num_candidates])
70+
mt_target_labels = torch.from_numpy(data[num_candidates : num_candidates * 2])
71+
mt_target_weights = torch.from_numpy(
72+
data[num_candidates * 2 : num_candidates * 3]
73+
)
74+
num_candidates = torch.tensor([num_candidates])
75+
metrics.update(
76+
predictions=mt_target_preds.view(1, -1),
77+
labels=mt_target_labels.view(1, -1),
78+
weights=mt_target_weights.view(1, -1),
79+
num_candidates=num_candidates,
80+
)
81+
for k, v in metrics.compute().items():
82+
logger.warning(f"{k}: {v}")
83+
84+
85+
if __name__ == "__main__":
86+
main()

0 commit comments

Comments
 (0)