|
| 1 | +# Mlperf Inference DeepSeek Reference Implementation |
| 2 | + |
| 3 | +## Model & Dataset Download |
| 4 | + |
| 5 | +> **Model**: [deepseek-ai/DeepSeek-R1](https://huggingface.co/deepseek-ai/DeepSeek-R1) (revision: `56d4cbbb4d29f4355bab4b9a39ccb717a14ad5ad`) |
| 6 | +
|
| 7 | +- DeepSeek-R1 model is automatically downloaded as part of setup |
| 8 | +- Checkpoint conversion is done transparently when needed. |
| 9 | + |
| 10 | +## Dataset Download |
| 11 | + |
| 12 | +### Preprocessed |
| 13 | + |
| 14 | +You can use Rclone to download the preprocessed dataset from a Cloudflare R2 bucket. |
| 15 | + |
| 16 | +To run Rclone on Windows, you can download the executable [here](https://rclone.org/install/#windows). |
| 17 | +To install Rclone on Linux/macOS/BSD systems, run: |
| 18 | +``` |
| 19 | +sudo -v ; curl https://rclone.org/install.sh | sudo bash |
| 20 | +``` |
| 21 | +Once Rclone is installed, run the following command to authenticate with the bucket: |
| 22 | +``` |
| 23 | +rclone config create mlc-inference s3 provider=Cloudflare access_key_id=f65ba5eef400db161ea49967de89f47b secret_access_key=fbea333914c292b854f14d3fe232bad6c5407bf0ab1bebf78833c2b359bdfd2b endpoint=https://c2686074cb2caf5cbaf6d134bdba8b47.r2.cloudflarestorage.com |
| 24 | +``` |
| 25 | +You can then navigate in the terminal to your desired download directory and run the following command to download the dataset: |
| 26 | + |
| 27 | +``` |
| 28 | +rclone copy mlc-inference:mlcommons-inference-wg-public/deepseek_r1/mlperf_deepseek_r1_dataset_4388_fp8_eval.pkl ./ -P |
| 29 | +``` |
| 30 | + |
| 31 | +### Calibration |
| 32 | + |
| 33 | +Download and install Rclone as described in the previous section. |
| 34 | + |
| 35 | +Then navigate in the terminal to your desired download directory and run the following command to download the dataset: |
| 36 | + |
| 37 | +``` |
| 38 | +rclone copy mlc-inference:mlcommons-inference-wg-public/deepseek_r1/mlperf_deepseek_r1_calibration_dataset_500_fp8_eval.pkl ./ -P |
| 39 | +``` |
| 40 | + |
| 41 | +## Docker |
| 42 | + |
| 43 | +The MLPerf DeepSeek reference implementation includes a comprehensive Docker launch system that supports multiple backends and provides advanced features like user management, persistent storage, and flexible configuration. |
| 44 | + |
| 45 | +### Launch Backend Specific Container |
| 46 | + |
| 47 | +Launch a Docker container with your preferred backend: |
| 48 | + |
| 49 | +```bash |
| 50 | +# Launch PyTorch backend |
| 51 | +./launch_docker.sh --backend pytorch |
| 52 | + |
| 53 | +# Launch vLLM backend |
| 54 | +./launch_docker.sh --backend vllm |
| 55 | + |
| 56 | +# Launch SGLang backend |
| 57 | +./launch_docker.sh --backend sglang |
| 58 | + |
| 59 | +# See launch_docker.sh for full list of args |
| 60 | +./launch_docker.sh --backend vllm --gpu-count 2 --extra-mounts "/data:/data,/models:/models" --local-user 0 |
| 61 | +``` |
| 62 | + |
| 63 | +### Available Backends |
| 64 | + |
| 65 | +- **pytorch**: via [DeepSeek-Ai/DeepSeek-V3](https://github.com/deepseek-ai/DeepSeek-V3) (reference implementation by DeepSeek-Ai) |
| 66 | +- **vllm**: vLLM's LLM api-based inference |
| 67 | +- **sglang**: sglang's OpenAI endpoint-based inference |
| 68 | + |
| 69 | +## Backend-Specific Setup |
| 70 | + |
| 71 | +After launching any Docker container, run the setup script which automatically detects your backend: |
| 72 | + |
| 73 | +```bash |
| 74 | +# Automatic backend detection and setup |
| 75 | +setup.sh |
| 76 | +``` |
| 77 | + |
| 78 | +The setup script creates a virtual environment and configures it differently based on the backend: |
| 79 | + |
| 80 | +#### All Backends |
| 81 | +- Virtual environment is **activated** after `setup.sh` |
| 82 | +- Activate backend-specific venv using `source .venv_[pytorch|vllm|sglang]/bin/activate` |
| 83 | +- All commands are to be run using the virtual environment |
| 84 | + |
| 85 | +## Running Evaluations |
| 86 | + |
| 87 | +### PyTorch Backend (Distributed) |
| 88 | + |
| 89 | +> ⚠️ **IMPORTANT NOTE**: The PyTorch reference implementation takes approximately 8 days to run on an H200x8 system. This is because large max-OSL (32K) limits concurrency (max-BS=16), and unoptimized pytorch forward and decode logics. |
| 90 | +
|
| 91 | +PyTorch backend uses distributed execution with `torchrun` and `run_eval_mpi.py`: |
| 92 | + |
| 93 | +```bash |
| 94 | +# Regular inference evaluation |
| 95 | +(.venv_pytorch) $ torchrun --nproc_per_node=8 run_eval_mpi.py --input-file <input_dataset>.pkl --output-file pytorch_output.pkl --num-samples 32 |
| 96 | + |
| 97 | +# MLPerf performance benchmarks |
| 98 | +(.venv_pytorch) $ torchrun --nproc_per_node=8 run_mlperf_mpi.py --mode offline --input-file <input_dataset>.pkl --output-dir mlperf_results |
| 99 | + |
| 100 | +# MLPerf accuracy mode |
| 101 | +(.venv_pytorch) $ torchrun --nproc_per_node=8 run_mlperf_mpi.py --mode offline --accuracy --input-file <input_dataset>.pkl --output-dir mlperf_results |
| 102 | +``` |
| 103 | + |
| 104 | +### vLLM and SGLang Backends |
| 105 | + |
| 106 | +For vLLM and SGLang, use single-process execution in `run_eval.py`: |
| 107 | + |
| 108 | +```bash |
| 109 | +# Regular inference evaluation |
| 110 | +(.venv_vllm) $ python run_eval.py --input-file <input_dataset>.pkl |
| 111 | +(.venv_sglang) $ python run_eval.py --input-file <input_dataset>.pkl |
| 112 | + |
| 113 | +# MLPerf performance benchmarks |
| 114 | +(.venv_vllm) $ python run_mlperf.py --mode offline --input-file <input_dataset>.pkl --output-dir mlperf_results |
| 115 | +(.venv_sglang) $ python run_mlperf.py --mode server --input-file <input_dataset>.pkl --output-dir mlperf_results |
| 116 | +``` |
| 117 | + |
| 118 | +## MLPerf Inference Support |
| 119 | + |
| 120 | +The reference implementation includes full support for MLPerf inference benchmarks through a System Under Test (SUT) wrapper that integrates with MLPerf LoadGen. |
| 121 | + |
| 122 | +### Running MLPerf Benchmarks |
| 123 | + |
| 124 | +#### Offline Scenario |
| 125 | +```bash |
| 126 | +(.venv_BACKEND) $ python run_mlperf.py \ |
| 127 | + --mode offline \ |
| 128 | + --input-file <input_dataset>.pkl \ |
| 129 | + --output-dir mlperf_results |
| 130 | +``` |
| 131 | + |
| 132 | +#### Server Scenario |
| 133 | +```bash |
| 134 | +(.venv_BACKEND) $ python run_mlperf.py \ |
| 135 | + --mode server \ |
| 136 | + --input-file <input_dataset>.pkl \ |
| 137 | + --output-dir mlperf_results |
| 138 | +``` |
| 139 | + |
| 140 | +#### Pytorch Backend for Mlperf |
| 141 | + |
| 142 | +PyTorch backend uses distributed execution with `torchrun` and `run_mlperf_mpi.py`: |
| 143 | + |
| 144 | +```bash |
| 145 | +# PyTorch MLPerf offline scenario |
| 146 | +(.venv_BACKEND) $ torchrun --nproc_per_node=8 run_mlperf_mpi.py \ |
| 147 | + --mode offline \ |
| 148 | + --input-file <input_dataset>.pkl \ |
| 149 | + --output-dir mlperf_results |
| 150 | +``` |
| 151 | + |
| 152 | +### MLPerf Command Line Options |
| 153 | + |
| 154 | +| Option | Description | Default | |
| 155 | +| -------------- | ------------------------------ | ---------------- | |
| 156 | +| `--mode` | Scenario mode (offline/server) | `offline` | |
| 157 | +| `--accuracy` | Run accuracy test | `False` | |
| 158 | +| `--output-dir` | Output directory for results | `mlperf_results` | |
| 159 | + |
| 160 | +### Backend Support Matrix |
| 161 | + |
| 162 | +The following table shows which backends support different evaluation and MLPerf operations: |
| 163 | + |
| 164 | +| Backend | `run_eval.py` | `run_mlperf.py --mode=offline` | `run_mlperf.py --mode=server` | |
| 165 | +| ----------- | ------------- | ------------------------------ | ----------------------------- | |
| 166 | +| pytorch-fp8 | x | x | | |
| 167 | +| vllm-fp8 | x | x | | |
| 168 | +| sglang-fp8 | x | x | x | |
| 169 | + |
| 170 | +> **Note**: For PyTorch backend, use the `_mpi` versions with `torchrun`. For vLLM and SGLang backends, use the single-process versions without `_mpi`. |
| 171 | +
|
| 172 | +## Accuracy Evaluation |
| 173 | + |
| 174 | +Accuracy evaluation is handled uniformly across all backends: |
| 175 | + |
| 176 | +```bash |
| 177 | +# within container, with virtualenv activated |
| 178 | +(.venv_BACKEND) $ python3 eval_accuracy.py --input-file <input_file>.pkl |
| 179 | +``` |
| 180 | + |
| 181 | +### Reference Evals |
| 182 | + |
| 183 | +Pytorch reference scores: |
| 184 | + |
| 185 | +```bash |
| 186 | +Evaluation Results: { |
| 187 | + "mean-accuracy": 81.67730173199635, |
| 188 | + "mean-output-tok-len": 4043.449863263446, |
| 189 | + "num-samples": 4388 |
| 190 | +} |
| 191 | +``` |
0 commit comments