How can I train the parameter in cross-attention?

Hello, I tried to fix a dimension issue and train the model, but when I check the gradients of the parameters in the cross-attention layer, there are no gradients for the parameters. Is this expected behavior? Or did I miss something? It seems weird since the paper states that the parameters in cross-attention are learnable.