Fix misplaced overflow handling return in fused_optimizer.py (#7645)

rraminen · tjruwase · web-flow · commit d56e847bac28 · 2025-10-31T02:03:49.000Z
This PR fixes an issue in deepspeed/runtime/fp16/fused_optimizer.py where the gradient overflow handling logic incorrectly exited the function too early, resulting in wrong forward pass and loss calculations in certain FP16 training scenarios. The `return self.overflow` and `self.timers.log(OVERFLOW_TIMERS)` calls are now correctly moved inside the `if self.overflow:` block so that the function only returns early when an actual overflow is detected. Origin of the error: 889f0ea cc: @jithunnair-amd Co-authored-by: Olatunji Ruwase <tjruwase@gmail.com>
diff --git a/deepspeed/runtime/fp16/fused_optimizer.py b/deepspeed/runtime/fp16/fused_optimizer.py
@@ -283,10 +283,9 @@ def step(self, closure=None):
                 for i, group in enumerate(self.fp16_groups):
                     for p in group:
                         p.grad = None
-
-            if self.timers:
-                self.timers.log(OVERFLOW_TIMERS)
-            return self.overflow
+                if self.timers:
+                    self.timers.log(OVERFLOW_TIMERS)
+                return self.overflow
 
         grads_groups_flat = []
         non_experts_grads_for_norm = []