VIT MLP classification head (maybe an error?) #627

AlbertoFormaggio1 · 2023-09-08T16:36:46Z

AlbertoFormaggio1
Sep 8, 2023

Dear all,

I was looking at the ViT tutorial and I found something that I don't understand (or maybe was missed by the writer of the tutorial).
In appendix D.3 of the ViT paper it is said

In order to stay as close as possible to the original Transformer model, we made use of an additional
[class] token, which is taken as image representation. The output of this token is then trans-
formed into a class prediction via a small multi-layer perceptron (MLP) with tanh as non-linearity
in the single hidden layer.
This design is inherited from the Transformer model for text, and we use it throughout the main
paper.

However, in the tutorial notebook we used only a LayerNorm followed by a Linear Layer.
Where is the catch?

Thank you in advance

mrdbourke · 2023-09-11T22:16:46Z

mrdbourke
Sep 11, 2023
Maintainer

Hey @AlbertoFormaggio1 ,

Great question!

Yes, you're right, the pure replication of the paper would include the MLP with tanh non-linearity as the output.

However, I decided to make it simpler in light of the authors releasing a paper after the original Vision Transformer paper called "Better plain ViT baselines for ImageNet-1k".

This paper introduced a few simplifications to the ViT architecture but still retained performance.

See resources:

Code implementation by lucidrains: https://github.com/lucidrains/vit-pytorch/blob/main/vit_pytorch/simple_vit.py
Short description of simplifications: https://github.com/lucidrains/vit-pytorch#simple-vit

Did you happen to try the different architectures (e.g. the paper vs. the course)?

That would be cool to see how they go.

Let me know if you have any other questions.

Daniel

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

VIT MLP classification head (maybe an error?) #627

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

VIT MLP classification head (maybe an error?) #627

Uh oh!

AlbertoFormaggio1 Sep 8, 2023

Replies: 1 comment

Uh oh!

mrdbourke Sep 11, 2023 Maintainer

AlbertoFormaggio1
Sep 8, 2023

mrdbourke
Sep 11, 2023
Maintainer