You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The custom checkpoint helper in this repo re-runs the forward pass during backprop without restoring the RNG state. Every stochastic layer inside the block, like dropout, sees a different random mask on the backward pass, so the gradients don't match the loss. So non-zero dropout with gradient checkpoint enabled causes loss to diverge.