|
1 | | -# Quantize Deepseek R1 to FP4 |
| 1 | +# Quantize Deepseek models to FP4 |
2 | 2 |
|
3 | | -This example will demonstrate the steps to quantize DeepSeek R1 model to FP4 and export a unified checkpoint that can be deployed with TRT-LLM. |
| 3 | +This example will demonstrate the steps to quantize DeepSeek models to FP4 and export a unified checkpoint that can be deployed with TRT-LLM. |
4 | 4 |
|
5 | 5 | ## Setup |
6 | 6 |
|
7 | 7 | Due to the model size, currently it requires 8xH200 or 16xH100 to quantize the FP8 model, we will use 8xH200 as example. |
8 | 8 |
|
9 | | -### Convert the HF checkpoint for deepseek FP8 inference |
| 9 | +## Convert the HF checkpoint for deepseek FP8 inference |
10 | 10 |
|
11 | 11 | ```bash |
12 | 12 | # set up variables to run the example |
13 | 13 | export HF_FP8_CKPT={path_to_downloaded_hf_checkpoint} |
14 | 14 | export DS_CKPT={path_to_save_converted_checkpoint} |
15 | 15 | export FP4_QUANT_PATH={path_to_save_quantization_results} |
16 | 16 | export HF_FP4_PATH={path_to_save_the_final_FP4_checkpoint} |
| 17 | +``` |
| 18 | + |
| 19 | +### DeepSeek V3 R1 V3.1 |
17 | 20 |
|
18 | | -# download the FP8 checkpoint from Hugginface |
| 21 | +```bash |
| 22 | +# download the FP8 checkpoint from Hugginface. This is an example of DeepSeek-R1 |
19 | 23 | huggingface-cli download deepseek-ai/DeepSeek-R1 --local-dir $HF_FP8_CKPT |
20 | 24 |
|
21 | 25 | # clone DeepSeek-V3 (base model of R1) Github repository for FP8 inference, |
22 | 26 | git clone https://github.com/deepseek-ai/DeepSeek-V3.git && cd DeepSeek-V3 && git checkout 1398800 |
| 27 | +``` |
| 28 | + |
| 29 | +### [Experimental] DeepSeek V3.2 |
23 | 30 |
|
| 31 | +```bash |
| 32 | +# download the FP8 checkpoint from Hugginface. |
| 33 | +huggingface-cli download deepseek-ai/DeepSeek-V3.2-Exp --local-dir $HF_FP8_CKPT |
| 34 | + |
| 35 | +# clone DeepSeek-V3.2 Github repository for FP8 inference, |
| 36 | +git clone https://github.com/deepseek-ai/DeepSeek-V3.2-Exp.git && cd DeepSeek-V3.2-Exp && git checkout 3b99a53 |
| 37 | + |
| 38 | +# Install requirements |
| 39 | +pip install git+https://github.com/Dao-AILab/fast-hadamard-transform.git |
| 40 | +pip install -r inference/requirements.txt |
| 41 | +``` |
| 42 | + |
| 43 | +### Convert the Checkpoint |
| 44 | + |
| 45 | +```bash |
24 | 46 | # convert the HF checkpoint to a specific format for Deepseek |
25 | 47 | python inference/convert.py --hf-ckpt-path $HF_FP8_CKPT --save-path $DS_CKPT --n-experts 256 --model-parallel 8 |
26 | 48 | ``` |
27 | 49 |
|
28 | | -### Post-training quantization |
| 50 | +## Post-training quantization |
| 51 | + |
| 52 | +### Run the calibration scripts |
29 | 53 |
|
30 | | -#### Run the calibration scripts |
| 54 | +DeepSeek V3, R1, V3.1 |
31 | 55 |
|
32 | 56 | ```bash |
33 | 57 | torchrun --nproc-per-node 8 --master_port=12346 ptq.py --model_path $DS_CKPT --config DeepSeek-V3/inference/configs/config_671B.json --quant_cfg NVFP4_DEFAULT_CFG --output_path $FP4_QUANT_PATH |
34 | 58 | ``` |
35 | 59 |
|
36 | | -#### Quantize the FP8 hf checkpoint to FP4 |
| 60 | +DeepSeek V3.2 |
| 61 | + |
| 62 | +```bash |
| 63 | +torchrun --nproc-per-node 8 --master_port=12346 ptq.py --model_path $DS_CKPT --config DeepSeek-V3.2-Exp/inference/config_671B_v3.2.json --quant_cfg NVFP4_DEFAULT_CFG --output_path $FP4_QUANT_PATH |
| 64 | +``` |
| 65 | + |
| 66 | +### Quantize the FP8 hf checkpoint to FP4 |
37 | 67 |
|
38 | 68 | We provide a one-step-script which will: |
39 | 69 |
|
|
0 commit comments