q(image), k,v(condition) channel is not matching in cross attention how to make the channels of q and k the same? 1. put k,v in to zero matrix(Same as the shape of the image)