Skip to content

Commit e3a2443

Browse files
[main][Doc] add mla pertoken quantization FAQ (vllm-project#2018)
### What this PR does / why we need it? When using deepseek series models generated by the --dynamic parameter, if torchair graph mode is enabled, we should modify the configuration file in the CANN package to prevent incorrect inference results. - vLLM version: v0.10.0 - vLLM main: vllm-project/vllm@7728dd7 --------- Signed-off-by: Wang Kunpeng <[email protected]>
1 parent 5b579dd commit e3a2443

File tree

1 file changed

+18
-0
lines changed

1 file changed

+18
-0
lines changed

docs/source/user_guide/feature_guide/quantization.md

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -105,3 +105,21 @@ submit a issue, maybe some new models need to be adapted.
105105
### 2. How to solve the error "Could not locate the configuration_deepseek.py"?
106106

107107
Please convert DeepSeek series models using `modelslim-VLLM-8.1.RC1.b020_001` modelslim, this version has fixed the missing configuration_deepseek.py error.
108+
109+
### 3. When converting deepseek series models with modelslim, what should you pay attention?
110+
111+
When using the weight generated by modelslim with the `--dynamic` parameter, if torchair graph mode is enabled, please modify the configuration file in the CANN package to prevent incorrect inference results.
112+
113+
The operation steps are as follows:
114+
115+
1. Search in the CANN package directory used, for example:
116+
find /usr/local/Ascend/ -name fusion_config.json
117+
118+
2. Add `"AddRmsNormDynamicQuantFusionPass":"off",` to the fusion_config.json you find, the location is as follows:
119+
120+
```bash
121+
{
122+
"Switch":{
123+
"GraphFusion":{
124+
"AddRmsNormDynamicQuantFusionPass":"off",
125+
```

0 commit comments

Comments
 (0)