Skip to content

Commit 40263cf

Browse files
tushar00jainfacebook-github-bot
authored andcommitted
fix allreduce usage (meta-pytorch#279)
Summary: pass full allreduce options to the pg allreduce to avoid the watchdog abort from getting triggered Differential Revision: D84101243
1 parent 6080eba commit 40263cf

File tree

1 file changed

+3
-1
lines changed

1 file changed

+3
-1
lines changed

torchft/manager.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -423,7 +423,9 @@ def allreduce(
423423
torch.accelerator.current_stream(),
424424
)
425425
else:
426-
work = self._pg.allreduce([tensor], reduce_op)
426+
opts = AllreduceOptions()
427+
opts.reduceOp = reduce_op
428+
work = self._pg.allreduce([tensor], opts)
427429

428430
# schedule grad normalization as a continuation
429431
# on the Future

0 commit comments

Comments
 (0)