GitHub - JenWei0312/All_things_attention: Comparison of different kinds of attentions

Attention Mechanisms

Implemented: Multi-head Attention with scaled dot product from "Attention Is All You Need" paper
Implemented: Grouped Query Attention as in Llma models. But unlike in Llama models, my implementation still uses scaled dot product
Implemented: Multi-Head Latent Attention as in DeepSeek models.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
MHA_vs_GQA.ipynb		MHA_vs_GQA.ipynb
MLA.ipynb		MLA.ipynb
README.md		README.md

Provide feedback