You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Eliminates unnecessary padding of key and value tensors to multiples of 128 in sequence length dimension.
Removes associated context saving and gradient unpadding operations that are no longer needed without the sequence length padding.
Simplifies the forward and backward pass implementation by removing conditional padding logic for masks and biases.
0 commit comments