[Transform] SpinQuant fix OOM (vllm-project#1976)

zhanglei1172 · web-flow · commit 8d366bd4b4b4 · 2025-10-31T00:05:03.000-04:00
SUMMARY:
"When using SpinQuantModifier for some fuse operations, it is necessary
to add the torch.no_grad decorator. Otherwise, PyTorch will capture the
grad graph by default, leading to a gradual increase in memory usage. I
encountered a CUDA OOM issue when rotating the MOE model, and the OOM
problem was resolved after fixing it."


TEST PLAN:
"Performed code quality evaluation locally"

Signed-off-by: LeiZhang &lt;isleizhang@outlook.com&gt;
diff --git a/src/llmcompressor/modifiers/transform/spinquant/base.py b/src/llmcompressor/modifiers/transform/spinquant/base.py
@@ -144,6 +144,7 @@ def on_initialize(self, state: State, **kwargs) -> bool:
 
         return True
 
+    @torch.no_grad()
     def on_start(self, state: State, event: Event, **kwargs):
         self.started_ = True