Skip to content

Commit d56e847

Browse files
rraminentjruwase
andauthored
Fix misplaced overflow handling return in fused_optimizer.py (#7645)
This PR fixes an issue in deepspeed/runtime/fp16/fused_optimizer.py where the gradient overflow handling logic incorrectly exited the function too early, resulting in wrong forward pass and loss calculations in certain FP16 training scenarios. The `return self.overflow` and `self.timers.log(OVERFLOW_TIMERS)` calls are now correctly moved inside the `if self.overflow:` block so that the function only returns early when an actual overflow is detected. Origin of the error: 889f0ea cc: @jithunnair-amd Co-authored-by: Olatunji Ruwase <[email protected]>
1 parent 02da373 commit d56e847

File tree

1 file changed

+3
-4
lines changed

1 file changed

+3
-4
lines changed

deepspeed/runtime/fp16/fused_optimizer.py

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -283,10 +283,9 @@ def step(self, closure=None):
283283
for i, group in enumerate(self.fp16_groups):
284284
for p in group:
285285
p.grad = None
286-
287-
if self.timers:
288-
self.timers.log(OVERFLOW_TIMERS)
289-
return self.overflow
286+
if self.timers:
287+
self.timers.log(OVERFLOW_TIMERS)
288+
return self.overflow
290289

291290
grads_groups_flat = []
292291
non_experts_grads_for_norm = []

0 commit comments

Comments
 (0)