Commit 8d366bd
authored
[Transform] SpinQuant fix OOM (vllm-project#1976)
SUMMARY:
"When using SpinQuantModifier for some fuse operations, it is necessary
to add the torch.no_grad decorator. Otherwise, PyTorch will capture the
grad graph by default, leading to a gradual increase in memory usage. I
encountered a CUDA OOM issue when rotating the MOE model, and the OOM
problem was resolved after fixing it."
TEST PLAN:
"Performed code quality evaluation locally"
Signed-off-by: LeiZhang <isleizhang@outlook.com>1 parent 296d48f commit 8d366bd
1 file changed
+1
-0
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
144 | 144 | | |
145 | 145 | | |
146 | 146 | | |
| 147 | + | |
147 | 148 | | |
148 | 149 | | |
149 | 150 | | |
| |||
0 commit comments