You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The assert felt morally valid- if no gradients are scaled, then something
is definitely wrong with the setup. In one instance, PP +
optimizer-in-backward (in torchtitan) resulted in grad=None after
running .backward() and before scaling grads.
On the other hand, the existing assert is too restrictive. It's
possible that a model used with pipelining would have some parameters
that do not receieve gradients, and we shouldn't hard-error in these
cases. (E.g. if the parameter is literally not used, or is frozen).
In the extreme case, the whole stage could be frozen. So we do not
complain if no grads are scaled.
Pull Request resolved: pytorch#145010
Approved by: https://github.com/mori360, https://github.com/tianyu-l
0 commit comments