Skip to content

Latest commit

 

History

History
13 lines (8 loc) · 516 Bytes

File metadata and controls

13 lines (8 loc) · 516 Bytes

Attention is all you need (to implement)

A deacade into the attention craze, this repository casts the era of deep learning step by step from the paper that started it all for LLMs. Some of the content presented pre-dates Transformers while other work is fairly recent.

  1. Transformer Architecture
  2. Neural Translation

TODO: Pre-Norm vs. PostNorm vs. FuseNorm

Clearly distinguish Read up on FuseNorm

TODO: Reasoning MVP