Rotary Embeddings Implementation


I was comparing the rotary embedding implementation in this repository with the implementations in the official Llama and Deepseek repositories using this Jupyter notebook: [link](https://colab.research.google.com/drive/1I9aBN55UUgmUwSNTmELC1u7DWuEk1dU2?usp=sharing). In Llama and Deepseek repositories, complex multiplication is used to perform the rotation of the **q** and **k** values, whereas it is implemented more explicitly [here](https://github.com/pytorch-labs/gpt-fast/blob/main/model.py#L292). Mathematically, I understand these methods are equivalent since:

$$(x + yi) \cdot (\cos t + i \sin t) = (x \cdot \cos t - y \cdot \sin t) + i \cdot (x \cdot \sin t + y \cdot \cos t)$$

- **LHS**: Used in Llama and Deepseek implementations  
- **RHS**: Used in the GPT-Fast implementation  

As demonstrated in the notebook, the complex multiplication approach is significantly faster. Maybe I'm missing something but is there a difference because of which the explicit method is preferred here?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Rotary Embeddings Implementation #226

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Rotary Embeddings Implementation #226

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions