Attention is all you need (to implement)

A deacade into the attention craze, this repository casts the era of deep learning step by step from the paper that started it all for LLMs. Some of the content presented pre-dates Transformers while other work is fairly recent.