[main][Doc] add mla pertoken quantization FAQ (vllm-project#2018)

kunpengW-code · web-flow · commit e3a2443c3a61 · 2025-07-27T08:47:51.000+08:00
### What this PR does / why we need it? When using deepseek series models generated by the --dynamic parameter, if torchair graph mode is enabled, we should modify the configuration file in the CANN package to prevent incorrect inference results. - vLLM version: v0.10.0 - vLLM main: vllm-project/vllm@7728dd7 --------- Signed-off-by: Wang Kunpeng <1289706727@qq.com>
diff --git a/docs/source/user_guide/feature_guide/quantization.md b/docs/source/user_guide/feature_guide/quantization.md
@@ -105,3 +105,21 @@ submit a issue, maybe some new models need to be adapted.
 ### 2. How to solve the error "Could not locate the configuration_deepseek.py"?
 
 Please convert DeepSeek series models using `modelslim-VLLM-8.1.RC1.b020_001` modelslim, this version has fixed the missing configuration_deepseek.py error.
+
+### 3. When converting deepseek series models with modelslim, what should you pay attention?
+
+When using the weight generated by modelslim with the `--dynamic` parameter, if torchair graph mode is enabled, please modify the configuration file in the CANN package to prevent incorrect inference results.
+
+The operation steps are as follows:
+
+1. Search in the CANN package directory used, for example:
+find /usr/local/Ascend/ -name fusion_config.json
+
+2. Add `"AddRmsNormDynamicQuantFusionPass":"off",` to the fusion_config.json you find, the location is as follows:
+
+```bash
+{
+    "Switch":{
+        "GraphFusion":{
+            "AddRmsNormDynamicQuantFusionPass":"off",
+```