Skip to content

Commit 23b50eb

Browse files
authored
Merge branch 'master' into arjunsuresh-patch-2
2 parents cc25ce4 + 44d9192 commit 23b50eb

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

44 files changed

+8811
-40
lines changed

.github/workflows/test-submission-generation.yml

Lines changed: 6 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -9,44 +9,10 @@ on:
99
- '.github/workflows/test-submission-generation.yml'
1010
- '**'
1111
- '!**.md'
12+
1213
jobs:
13-
submission_generation:
14-
runs-on: ${{ matrix.os }}
15-
strategy:
16-
fail-fast: false
17-
matrix:
18-
os: [ubuntu-latest, windows-latest, macos-latest]
19-
python-version: [ "3.12" ]
20-
division: ["closed", "open", "closed-open"]
21-
category: ["datacenter", "edge"]
22-
case: ["closed"]
23-
action: ["run", "docker"]
24-
exclude:
25-
- os: macos-latest
26-
- os: windows-latest
27-
- category: "edge"
28-
29-
steps:
30-
- uses: actions/checkout@v4
31-
- name: Set up Python ${{ matrix.python-version }}
32-
uses: actions/setup-python@v3
33-
with:
34-
python-version: ${{ matrix.python-version }}
35-
- name: Install dependencies
36-
run: |
37-
pip install mlc-scripts
38-
- name: Pull repo where test cases are uploaded
39-
run: |
40-
git clone -b submission-generation-examples https://github.com/mlcommons/inference.git submission_generation_examples
41-
- name: Run Submission Generation - ${{ matrix.case }} ${{ matrix.action }} ${{ matrix.category }} ${{ matrix.division }}
42-
continue-on-error: true
43-
run: |
44-
if [ "${{ matrix.case }}" == "closed" ]; then
45-
description="Test submission - contains closed edge and datacenter"
46-
elif [ "${{ matrix.case }}" == "closed-power" ]; then
47-
description="Test submission - contains closed-power edge and datacenter results"
48-
fi
49-
# Dynamically set the log group to simulate a dynamic step name
50-
echo "::group::$description"
51-
mlc ${{ matrix.action }} script --tags=generate,inference,submission --adr.compiler.tags=gcc --version=v5.0 --clean --preprocess_submission=yes --submission_base_dir=mysubmissions --results_dir=$PWD/submission_generation_tests/${{ matrix.case }}/ --run-checker --submitter=MLCommons --tar=yes --division=${{ matrix.division }} --env.MLC_DETERMINE_MEMORY_CONFIGURATION=yes --quiet
52-
mlc ${{ matrix.action }} script --tags=run,submission,checker --submitter_id_off=mysubmitter_id --tar=yes --submission_dir=mysubmissions/submissions --submission_tar_file=mysubmission.tar.gz
14+
run-tests:
15+
uses: mlcommons/mlperf-automations/.github/workflows/test-mlperf-inference-submission-generation.yml@dev
16+
with:
17+
ref: ${{ github.event.pull_request.head.ref }}
18+
repo-url: ${{ github.event.pull_request.head.repo.html_url }}

.gitmodules

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,3 +4,9 @@
44
[submodule "vision/medical_imaging/3d-unet-brats19/nnUnet"]
55
path = vision/medical_imaging/3d-unet-brats19/nnUnet
66
url = https://github.com/MIC-DKFZ/nnUNet.git
7+
[submodule "language/deepseek-r1/submodules/prm800k"]
8+
path = language/deepseek-r1/submodules/prm800k
9+
url = https://github.com/openai/prm800k
10+
[submodule "language/deepseek-r1/submodules/LiveCodeBench"]
11+
path = language/deepseek-r1/submodules/LiveCodeBench
12+
url = https://github.com/LiveCodeBench/LiveCodeBench

language/deepseek-r1/.gitignore

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
*.pyc
2+
.nvimrc.lua
3+
__pycache__/
4+
.venv/
5+
build/
6+
data/
7+
mlperf_results/
8+
remove_dev_files.sh
9+
.cursor
10+
.venv_*

language/deepseek-r1/README.md

Lines changed: 191 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,191 @@
1+
# Mlperf Inference DeepSeek Reference Implementation
2+
3+
## Model & Dataset Download
4+
5+
> **Model**: [deepseek-ai/DeepSeek-R1](https://huggingface.co/deepseek-ai/DeepSeek-R1) (revision: `56d4cbbb4d29f4355bab4b9a39ccb717a14ad5ad`)
6+
7+
- DeepSeek-R1 model is automatically downloaded as part of setup
8+
- Checkpoint conversion is done transparently when needed.
9+
10+
## Dataset Download
11+
12+
### Preprocessed
13+
14+
You can use Rclone to download the preprocessed dataset from a Cloudflare R2 bucket.
15+
16+
To run Rclone on Windows, you can download the executable [here](https://rclone.org/install/#windows).
17+
To install Rclone on Linux/macOS/BSD systems, run:
18+
```
19+
sudo -v ; curl https://rclone.org/install.sh | sudo bash
20+
```
21+
Once Rclone is installed, run the following command to authenticate with the bucket:
22+
```
23+
rclone config create mlc-inference s3 provider=Cloudflare access_key_id=f65ba5eef400db161ea49967de89f47b secret_access_key=fbea333914c292b854f14d3fe232bad6c5407bf0ab1bebf78833c2b359bdfd2b endpoint=https://c2686074cb2caf5cbaf6d134bdba8b47.r2.cloudflarestorage.com
24+
```
25+
You can then navigate in the terminal to your desired download directory and run the following command to download the dataset:
26+
27+
```
28+
rclone copy mlc-inference:mlcommons-inference-wg-public/deepseek_r1/mlperf_deepseek_r1_dataset_4388_fp8_eval.pkl ./ -P
29+
```
30+
31+
### Calibration
32+
33+
Download and install Rclone as described in the previous section.
34+
35+
Then navigate in the terminal to your desired download directory and run the following command to download the dataset:
36+
37+
```
38+
rclone copy mlc-inference:mlcommons-inference-wg-public/deepseek_r1/mlperf_deepseek_r1_calibration_dataset_500_fp8_eval.pkl ./ -P
39+
```
40+
41+
## Docker
42+
43+
The MLPerf DeepSeek reference implementation includes a comprehensive Docker launch system that supports multiple backends and provides advanced features like user management, persistent storage, and flexible configuration.
44+
45+
### Launch Backend Specific Container
46+
47+
Launch a Docker container with your preferred backend:
48+
49+
```bash
50+
# Launch PyTorch backend
51+
./launch_docker.sh --backend pytorch
52+
53+
# Launch vLLM backend
54+
./launch_docker.sh --backend vllm
55+
56+
# Launch SGLang backend
57+
./launch_docker.sh --backend sglang
58+
59+
# See launch_docker.sh for full list of args
60+
./launch_docker.sh --backend vllm --gpu-count 2 --extra-mounts "/data:/data,/models:/models" --local-user 0
61+
```
62+
63+
### Available Backends
64+
65+
- **pytorch**: via [DeepSeek-Ai/DeepSeek-V3](https://github.com/deepseek-ai/DeepSeek-V3) (reference implementation by DeepSeek-Ai)
66+
- **vllm**: vLLM's LLM api-based inference
67+
- **sglang**: sglang's OpenAI endpoint-based inference
68+
69+
## Backend-Specific Setup
70+
71+
After launching any Docker container, run the setup script which automatically detects your backend:
72+
73+
```bash
74+
# Automatic backend detection and setup
75+
setup.sh
76+
```
77+
78+
The setup script creates a virtual environment and configures it differently based on the backend:
79+
80+
#### All Backends
81+
- Virtual environment is **activated** after `setup.sh`
82+
- Activate backend-specific venv using `source .venv_[pytorch|vllm|sglang]/bin/activate`
83+
- All commands are to be run using the virtual environment
84+
85+
## Running Evaluations
86+
87+
### PyTorch Backend (Distributed)
88+
89+
> ⚠️ **IMPORTANT NOTE**: The PyTorch reference implementation takes approximately 8 days to run on an H200x8 system. This is because large max-OSL (32K) limits concurrency (max-BS=16), and unoptimized pytorch forward and decode logics.
90+
91+
PyTorch backend uses distributed execution with `torchrun` and `run_eval_mpi.py`:
92+
93+
```bash
94+
# Regular inference evaluation
95+
(.venv_pytorch) $ torchrun --nproc_per_node=8 run_eval_mpi.py --input-file <input_dataset>.pkl --output-file pytorch_output.pkl --num-samples 32
96+
97+
# MLPerf performance benchmarks
98+
(.venv_pytorch) $ torchrun --nproc_per_node=8 run_mlperf_mpi.py --mode offline --input-file <input_dataset>.pkl --output-dir mlperf_results
99+
100+
# MLPerf accuracy mode
101+
(.venv_pytorch) $ torchrun --nproc_per_node=8 run_mlperf_mpi.py --mode offline --accuracy --input-file <input_dataset>.pkl --output-dir mlperf_results
102+
```
103+
104+
### vLLM and SGLang Backends
105+
106+
For vLLM and SGLang, use single-process execution in `run_eval.py`:
107+
108+
```bash
109+
# Regular inference evaluation
110+
(.venv_vllm) $ python run_eval.py --input-file <input_dataset>.pkl
111+
(.venv_sglang) $ python run_eval.py --input-file <input_dataset>.pkl
112+
113+
# MLPerf performance benchmarks
114+
(.venv_vllm) $ python run_mlperf.py --mode offline --input-file <input_dataset>.pkl --output-dir mlperf_results
115+
(.venv_sglang) $ python run_mlperf.py --mode server --input-file <input_dataset>.pkl --output-dir mlperf_results
116+
```
117+
118+
## MLPerf Inference Support
119+
120+
The reference implementation includes full support for MLPerf inference benchmarks through a System Under Test (SUT) wrapper that integrates with MLPerf LoadGen.
121+
122+
### Running MLPerf Benchmarks
123+
124+
#### Offline Scenario
125+
```bash
126+
(.venv_BACKEND) $ python run_mlperf.py \
127+
--mode offline \
128+
--input-file <input_dataset>.pkl \
129+
--output-dir mlperf_results
130+
```
131+
132+
#### Server Scenario
133+
```bash
134+
(.venv_BACKEND) $ python run_mlperf.py \
135+
--mode server \
136+
--input-file <input_dataset>.pkl \
137+
--output-dir mlperf_results
138+
```
139+
140+
#### Pytorch Backend for Mlperf
141+
142+
PyTorch backend uses distributed execution with `torchrun` and `run_mlperf_mpi.py`:
143+
144+
```bash
145+
# PyTorch MLPerf offline scenario
146+
(.venv_BACKEND) $ torchrun --nproc_per_node=8 run_mlperf_mpi.py \
147+
--mode offline \
148+
--input-file <input_dataset>.pkl \
149+
--output-dir mlperf_results
150+
```
151+
152+
### MLPerf Command Line Options
153+
154+
| Option | Description | Default |
155+
| -------------- | ------------------------------ | ---------------- |
156+
| `--mode` | Scenario mode (offline/server) | `offline` |
157+
| `--accuracy` | Run accuracy test | `False` |
158+
| `--output-dir` | Output directory for results | `mlperf_results` |
159+
160+
### Backend Support Matrix
161+
162+
The following table shows which backends support different evaluation and MLPerf operations:
163+
164+
| Backend | `run_eval.py` | `run_mlperf.py --mode=offline` | `run_mlperf.py --mode=server` |
165+
| ----------- | ------------- | ------------------------------ | ----------------------------- |
166+
| pytorch-fp8 | x | x | |
167+
| vllm-fp8 | x | x | |
168+
| sglang-fp8 | x | x | x |
169+
170+
> **Note**: For PyTorch backend, use the `_mpi` versions with `torchrun`. For vLLM and SGLang backends, use the single-process versions without `_mpi`.
171+
172+
## Accuracy Evaluation
173+
174+
Accuracy evaluation is handled uniformly across all backends:
175+
176+
```bash
177+
# within container, with virtualenv activated
178+
(.venv_BACKEND) $ python3 eval_accuracy.py --input-file <input_file>.pkl
179+
```
180+
181+
### Reference Evals
182+
183+
Pytorch reference scores:
184+
185+
```bash
186+
Evaluation Results: {
187+
"mean-accuracy": 81.67730173199635,
188+
"mean-output-tok-len": 4043.449863263446,
189+
"num-samples": 4388
190+
}
191+
```
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
"""
2+
Modular backend system for MLPerf DeepSeek reference implementation.
3+
4+
Supports TensorRT-LLM, SGLang, vLLM, and PyTorch backends with shared API arguments
5+
but independent execution implementations.
6+
"""
7+
8+
from .base_backend import BaseBackend
9+
10+
# Note: Specific backend implementations are imported dynamically as needed
11+
# to avoid dependency issues when only using certain backends
12+
__all__ = [
13+
'BaseBackend',
14+
]

0 commit comments

Comments
 (0)