Do you know why EBM architectures are always parametrized as convolutional models and never use self-attention?