add nvfp4 benchmark (Tencent#77)

StromNoNo · web-flow · commit 7ae15d93088f · 2025-09-24T14:27:46.000+08:00
diff --git a/README.md b/README.md
@@ -31,6 +31,7 @@
 - [技术交流](#技术交流)
 
 ## 📣最新进展
+- [25/09/24] 我们支持了Qwen3系列模型的NVFP4的PTQ量化，我们还开源了[Qwen3-32B-NVFP4](https://huggingface.co/AngelSlim/Qwen3-32B_nvfp4)、[Qwen3-235B-A22B-NVFP4](https://huggingface.co/AngelSlim/Qwen3-235B-A22B_nvfp4)权重。
 - [25/09/01] 我们支持了[Hunyuan-MT-7B](https://huggingface.co/tencent/Hunyuan-MT-7B-fp8)翻译开源模型的FP8量化；支持了Eagle3的Torch推理及Benchmark评测流程；支持了[FLUX](https://github.com/Tencent/AngelSlim/tree/main/configs/flux)的量化、Cache；支持了[Seed-OSS](https://github.com/Tencent/AngelSlim/tree/main/configs/seed_oss)模型量化压缩。
 - [25/08/06] 我们支持了`Hunyuan 0.5B/1.8B/4B/7B`和`Qwen2.5VL 3B/7B/32B/72B`的FP8、INT4量化，支持了`DeepSeek-R1/V3`和`Kimi-K2`模型的`FP8-Static`、`W4A8-FP8`量化。我们还开源了`Hunyuan 1.8B/4B/7B`系列模型的Eagle3权重。
 - [25/07/04] 我们支持了`Hunyuan/Qwen2.5/Qwen3/DeepSeek-R1-Distill-Qwen`等模型的量化，包含INT8、FP8、INT4等算法。
diff --git a/README_en.md b/README_en.md
@@ -31,6 +31,7 @@ Dedicated to building a more intuitive, comprehensive, and efficient LLMs compre
 - [Technical Discussion](#technical-discussion)
 
 ## 📣Latest Updates
+- [25/09/24] We now support the PTQ quantification of NVFP4 for the Qwen3 series models. We also opensource [Qwen3-32B-NVFP4](https://huggingface.co/AngelSlim/Qwen3-32B_nvfp4) and [Qwen3-235B-A22B-NVFP4](https://huggingface.co/AngelSlim/Qwen3-235B-A22B_nvfp4) weights.
 - [25/09/01] We now support ​FP8 quantization​ of the [Hunyuan-MT-7B](https://huggingface.co/tencent/Hunyuan-MT-7B-fp8) translation model. And enabled ​Torch inference and Benchmark evaluation​ for Eagle3. And implemented support for ​quantization and Cache​ for [FLUX](https://github.com/Tencent/AngelSlim/tree/main/configs/flux). And support ​quantization​ for the [Seed-OSS](https://github.com/Tencent/AngelSlim/tree/main/configs/seed_oss).
 - [25/08/06] We now support quantization for `Hunyuan 0.5B/1.8B/4B/7B` and multimodal model `Qwen2.5VL 3B/7B/32B/72B`, including `FP8/INT4` algorithms, and quantization for `DeepSeek-R1/V3` and `Kimi-K2`, including `FP8-Static` and `W4A8-FP8` algorithms. We also opensource `Hunyuan 1.8B/4B/7B` series Eagle3 model weight.
 - [25/07/04] We now support quantization for `Hunyuan/Qwen2.5/Qwen3/DeepSeek-R1-Distill-Qwen` and other models, including `INT8/FP8/INT4` algorithms. We also opensource `Qwen3` series Eagle3 model weight.
diff --git a/docs/source/performance/quantization/benchmarks.md b/docs/source/performance/quantization/benchmarks.md
@@ -386,3 +386,26 @@ INT4-GPTAQ在`GSM8K`、`HUMANEVAL`、`GPQA Diamond`上的评测结果如下：
    |           | INT4-GPTAQ   | 69.52 | 37.20     | -            |
    +-----------+--------------+-------+-----------+--------------+
 ```
+
+
+## NVFP4
+
+NVFP4在`GSM8K`、`MMLU`、`GPQA Diamond`上的评测结果如下：
+
+```{eval-rst}
+.. table::
+   :align: center
+   :name: table-NVFP4-performance
+
+   +-----------------+--------------+-------+-------+--------------+
+   | Model           | Quantization | GSM8K | MMLU  | GPQA Diamond |
+   +=================+==============+=======+=======+==============+
+   | Qwen3-32B       | BF16         | 67.06 | 81.72 | 54.04        |
+   +                 +--------------+-------+-------+--------------+
+   |                 | NVFP4        | 69.87 | 80.74 | 56.06        |
+   +-----------------+--------------+-------+-------+--------------+
+   | Qwen3-235B-A22B | BF16         | 96.63 | 62.73 | 60.60        |
+   +                 +--------------+-------+-------+--------------+
+   |                 | NVFP4        | 96.17 | 62.09 | 60.10        |
+   +-----------------+--------------+-------+-------+--------------+
+```