Commit 87a0b7b
[bugfix] adapt bugfix for norm_quant_fusion_pass to npugraph_ex (#6726)
### What this PR does / why we need it?
This PR adapts bugfixes from `norm_quant_fusion_pass` to
`graphex_norm_quant_fusion_pass` for the `npugraph_ex` backend.
The main changes are:
- Replaced `torch.ops.npu.npu_add_rms_norm` with
`torch.ops._C_ascend.npu_add_rms_norm_bias`.
- For patterns without bias, `None` is passed as the bias argument.
- For patterns with bias, the separate `add` operation for bias is
removed and the bias is passed directly to `npu_add_rms_norm_bias`. This
improves fusion.
These changes ensure consistency and correctness for RMSNorm and
quantization fusion patterns when using `npugraph_ex`.
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
- vLLM version: v0.15.0
- vLLM main:
vllm-project/vllm@9562912
Signed-off-by: huyuanquan1 <huyuanquan1@huawei.com>
Co-authored-by: huyuanquan1 <huyuanquan1@huawei.com>1 parent 41d056f commit 87a0b7b
File tree
1 file changed
+12
-6
lines changed- vllm_ascend/compilation/npugraph_ex_passes
1 file changed
+12
-6
lines changedLines changed: 12 additions & 6 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
58 | 58 | | |
59 | 59 | | |
60 | 60 | | |
61 | | - | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
62 | 64 | | |
63 | 65 | | |
64 | 66 | | |
| |||
123 | 125 | | |
124 | 126 | | |
125 | 127 | | |
126 | | - | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
127 | 131 | | |
128 | 132 | | |
129 | | - | |
130 | 133 | | |
131 | 134 | | |
132 | 135 | | |
| |||
188 | 191 | | |
189 | 192 | | |
190 | 193 | | |
191 | | - | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
192 | 197 | | |
193 | 198 | | |
194 | 199 | | |
| |||
255 | 260 | | |
256 | 261 | | |
257 | 262 | | |
258 | | - | |
| 263 | + | |
| 264 | + | |
| 265 | + | |
259 | 266 | | |
260 | 267 | | |
261 | | - | |
262 | 268 | | |
263 | 269 | | |
264 | 270 | | |
| |||
0 commit comments