When does loss.backward() get called when accumulating gradients? #8460
-
Just wondering when backward gets called when you are accumulating over (say 8) batches. I had put a breakpoint in |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
Dear @sachinruk, This behaviour was a bug and should have been resolved on master and on_after_backward should be called after each backward call. In the case of accumulating over (say 8) batches, you should see Best, |
Beta Was this translation helpful? Give feedback.
Dear @sachinruk,
This behaviour was a bug and should have been resolved on master and on_after_backward should be called after each backward call. In the case of accumulating over (say 8) batches, you should see
on_after_backward
called 8 times on master, and only 1 time previously.Best,
T.C