q(image), k,v(condition) channel dismatch

q(image), k,v(condition) channel is not matching in cross attention
how to make the channels of q and k the same?
1. put k,v in to zero matrix(Same as the shape of the image)