[0.9.1][bugfix]fix wrong cached torchair graph directory of deepseek_mtp model (#2531)

linfeng-yuan · web-flow · commit 4e578f5ec1ea · 2025-08-26T19:03:39.000+08:00
### What this PR does / why we need it?
Fix the inconsistency of cached directories between deepseek and its mtp
model with torchair graph. This bug would lead to assertion error while
running deepseek_mtp.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
CI and e2e vllm serving pass.

Signed-off-by: linfeng-yuan &lt;1102311262@qq.com&gt;
diff --git a/vllm_ascend/worker/mtp_proposer_v1.py b/vllm_ascend/worker/mtp_proposer_v1.py
@@ -19,7 +19,7 @@
 from vllm_ascend.attention.utils import AscendCommonAttentionMetadata
 from vllm_ascend.distributed.utils import is_lmhead_tp
 from vllm_ascend.models.deepseek_mtp import CustomDeepSeekMTP
-from vllm_ascend.utils import ProfileExecuteDuration
+from vllm_ascend.utils import TORCHAIR_CACHE_DIR, ProfileExecuteDuration
 
 
 # FIXME(woosuk): The logic here is duplicated with the main sampling code.
@@ -423,6 +423,7 @@ def _get_torchair_lazy_compiled_model(self, batch_size: int):
                     self.model.__dict__[forward_proxy_name],
                     dynamic=True,
                     fullgraph=envs_vllm.VLLM_TEST_DYNAMO_FULLGRAPH_CAPTURE,
+                    cache_dir=TORCHAIR_CACHE_DIR,
                     config=config,
                     ge_cache=False)
             return self.torchair_compiled_models[batch_size]