You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, I tried to fix a dimension issue and train the model, but when I check the gradients of the parameters in the cross-attention layer, there are no gradients for the parameters. Is this expected behavior? Or did I miss something? It seems weird since the paper states that the parameters in cross-attention are learnable.