VIT MLP classification head (maybe an error?) #627
Unanswered
AlbertoFormaggio1
asked this question in
Q&A
Replies: 1 comment
-
Hey @AlbertoFormaggio1 , Great question! Yes, you're right, the pure replication of the paper would include the MLP with tanh non-linearity as the output. However, I decided to make it simpler in light of the authors releasing a paper after the original Vision Transformer paper called "Better plain ViT baselines for ImageNet-1k". This paper introduced a few simplifications to the ViT architecture but still retained performance. See resources:
Did you happen to try the different architectures (e.g. the paper vs. the course)? That would be cool to see how they go. Let me know if you have any other questions. Daniel |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Dear all,
I was looking at the ViT tutorial and I found something that I don't understand (or maybe was missed by the writer of the tutorial).
In appendix D.3 of the ViT paper it is said
However, in the tutorial notebook we used only a LayerNorm followed by a Linear Layer.
Where is the catch?
Thank you in advance
Beta Was this translation helpful? Give feedback.
All reactions