Skip to content
Discussion options

You must be logged in to vote

Hello,

My understanding is that there are multiple variants to compute RoPE and since Sebastian is loading/using weights from HuggingFace, he has to also match their way (2 halves variant) of computing RoPE. Otherwise you'd be rotating wrong features together (in K and Q) and you'll end up with a nasty silent bug (it'll still work but performance will be subpar)

In theory you could do the pairing any way you want, if you train from scratch, it wouldn't matter. But if you use pretrained weights, you'll have to be consistent: Either you change your RoPE computation or you reorder/permute the weights in the pretrained Q_w and K_w to match your own pairing variant.

There was a thread on the H…

Replies: 1 comment 2 replies

Comment options

You must be logged in to vote
2 replies
@rasbt
Comment options

@JiangJiaWei1103
Comment options

Answer selected by rasbt
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants