Visualization VIT feature

Hi, author.

To visualize your results attention map, how can you visualize this?

1) Use Encoder (ViT)?
2) Use Decoder (VIT)? 

given input x -> y = encoder(x) -> decoder(y). then use final vit of decoder(y)?