implement position encoding for shifted tokens #7114

pytorchbot · 2024-11-27T19:34:53Z

This PR was created by the merge bot to help merge the original PR into the main branch.
ghstack PR number: #6646
^ Please use this as the source of truth for the PR details, comments, and reviews
ghstack PR base: https://github.com/pytorch/executorch/tree/gh/helunwencser/71/base
ghstack PR head: https://github.com/pytorch/executorch/tree/gh/helunwencser/71/head
Merge bot PR base: https://github.com/pytorch/executorch/tree/gh/helunwencser/68/orig
Merge bot PR head: https://github.com/pytorch/executorch/tree/gh/helunwencser/71/orig
@diff-train-skip-merge

Pull Request resolved: #6560 Right now, rope related code scatters around a few different places in `llama_transformer`. It makes it hard to make changes to rope related things. This PR moves all rope related logic into its own module. ghstack-source-id: 255543205 Differential Revision: [D65173598](https://our.internmc.facebook.com/intern/diff/D65173598/)

Pull Request resolved: #6646 In AttentionSink, it uses tokens' positions in the KVCache instead of the actual text. When tokens get shifted in KVCache, it needs to update q and k's position embedding. In the original [implementation](https://github.com/mit-han-lab/streaming-llm) of AttentionSink with Rope, it caches the original q and k in KVCache and apply position embedding during inference. This PR adds `RopeWithAttentionSink`. It assumes that q and k are already encoded with their original position. When we shift tokens, we reapply the position delta. This has two benefits: - minimize our code since our existing `llama_transformer` applies rope embedding before doing KVCache update - avoid performance regression when tokens are not shifted because we don't need to reapply position encoding in KVCache for them ghstack-source-id: 255579838 Differential Revision: [D65366440](https://our.internmc.facebook.com/intern/diff/D65366440/)

pytorch-bot · 2024-11-27T19:34:57Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/7114

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 923e31e with merge base b8fbc48 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

helunwencser added 2 commits November 26, 2024 17:13

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 27, 2024

Base automatically changed from gh/helunwencser/68/orig to main November 27, 2024 20:02

kirklandsign added the release notes:attention_sink label Nov 27, 2024

kirklandsign approved these changes Nov 27, 2024

View reviewed changes

kirklandsign merged commit 6b73841 into main Nov 27, 2024
40 checks passed

kirklandsign deleted the gh/helunwencser/71/orig branch November 27, 2024 20:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

implement position encoding for shifted tokens #7114

implement position encoding for shifted tokens #7114

Uh oh!

pytorchbot commented Nov 27, 2024

Uh oh!

pytorch-bot bot commented Nov 27, 2024 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

implement position encoding for shifted tokens #7114

implement position encoding for shifted tokens #7114

Uh oh!

Conversation

pytorchbot commented Nov 27, 2024

Uh oh!

pytorch-bot bot commented Nov 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/7114

✅ No Failures

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

pytorch-bot bot commented Nov 27, 2024 •

edited

Loading