[Executorch][llm] Fix ring kv cache when used with quantized kv cache and sdpa #12132

kimishpatel · 2025-07-01T16:38:50Z

Stack from ghstack (oldest at bottom):

When using quantized kv cache and SDPA, there was two bugs:

It did not reset return_float_values of QuantizedRingKVCache. Which results in QuantizedKVCache returning float values post dequant.
For quantized kv cache, SDPA module stores kv_cache that is owned by attention module. When replacing kv cache in Attention we have to make sure that we change the reference in SDPA as well.

Differential Revision: D77516823

… and sdpa When using quantized kv cache and SDPA, there was two bugs: 1. It did not reset return_float_values of QuantizedRingKVCache. Which results in QuantizedKVCache returning float values post dequant. 2. For quantized kv cache, SDPA module stores kv_cache that is owned by attention module. When replacing kv cache in Attention we have to make sure that we change the reference in SDPA as well. Differential Revision: [D77516823](https://our.internmc.facebook.com/intern/diff/D77516823/) [ghstack-poisoned]

pytorch-bot · 2025-07-01T16:38:54Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/12132

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit ec99dce with merge base cf0bfd2 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

… and sdpa When using quantized kv cache and SDPA, there was two bugs: 1. It did not reset return_float_values of QuantizedRingKVCache. Which results in QuantizedKVCache returning float values post dequant. 2. For quantized kv cache, SDPA module stores kv_cache that is owned by attention module. When replacing kv cache in Attention we have to make sure that we change the reference in SDPA as well. Differential Revision: [D77516823](https://our.internmc.facebook.com/intern/diff/D77516823/) ghstack-source-id: 293661340 Pull Request resolved: #12132

facebook-github-bot · 2025-07-01T16:39:16Z

This pull request was exported from Phabricator. Differential Revision: D77516823

github-actions · 2025-07-01T16:39:55Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

@kimishpatel

… and sdpa (#12143) This PR was created by the merge bot to help merge the original PR into the main branch. ghstack PR number: #12132 by @kimishpatel ^ Please use this as the source of truth for the PR details, comments, and reviews ghstack PR base: https://github.com/pytorch/executorch/tree/gh/kimishpatel/196/base ghstack PR head: https://github.com/pytorch/executorch/tree/gh/kimishpatel/196/head Merge bot PR base: https://github.com/pytorch/executorch/tree/gh/kimishpatel/195/orig Merge bot PR head: https://github.com/pytorch/executorch/tree/gh/kimishpatel/196/orig @diff-train-skip-merge --------- Co-authored-by: Kimish Patel <[email protected]>

@kimishpatel

… and sdpa (pytorch#12143) This PR was created by the merge bot to help merge the original PR into the main branch. ghstack PR number: pytorch#12132 by @kimishpatel ^ Please use this as the source of truth for the PR details, comments, and reviews ghstack PR base: https://github.com/pytorch/executorch/tree/gh/kimishpatel/196/base ghstack PR head: https://github.com/pytorch/executorch/tree/gh/kimishpatel/196/head Merge bot PR base: https://github.com/pytorch/executorch/tree/gh/kimishpatel/195/orig Merge bot PR head: https://github.com/pytorch/executorch/tree/gh/kimishpatel/196/orig @diff-train-skip-merge --------- Co-authored-by: Kimish Patel <[email protected]>

kimishpatel requested review from jackzhxng and lucylq as code owners July 1, 2025 16:38

This was referenced Jul 1, 2025

[Executorch][llm] Make runner return error if execution was not successful #12129

Merged

[Executorch][llm] Make mask tensor float only for sdpa #12131

Merged

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 1, 2025

facebook-github-bot added the fb-exported label Jul 1, 2025

metascroy self-requested a review July 1, 2025 16:40

metascroy approved these changes Jul 1, 2025

View reviewed changes

facebook-github-bot merged commit 2881e7d into gh/kimishpatel/196/base Jul 1, 2025
102 of 106 checks passed

facebook-github-bot deleted the gh/kimishpatel/196/head branch July 1, 2025 20:12

facebook-github-bot temporarily deployed to cherry-pick-bot July 1, 2025 20:12 — with GitHub Actions Inactive

pytorchbot mentioned this pull request Jul 1, 2025

[Executorch][llm] Fix ring kv cache when used with quantized kv cache and sdpa #12143

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Executorch][llm] Fix ring kv cache when used with quantized kv cache and sdpa #12132

[Executorch][llm] Fix ring kv cache when used with quantized kv cache and sdpa #12132

Uh oh!

kimishpatel commented Jul 1, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Jul 1, 2025 •

edited

Loading

Uh oh!

facebook-github-bot commented Jul 1, 2025

Uh oh!

github-actions bot commented Jul 1, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

[Executorch][llm] Fix ring kv cache when used with quantized kv cache and sdpa #12132

[Executorch][llm] Fix ring kv cache when used with quantized kv cache and sdpa #12132

Uh oh!

Conversation

kimishpatel commented Jul 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jul 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/12132

✅ No Failures

Uh oh!

facebook-github-bot commented Jul 1, 2025

Uh oh!

github-actions bot commented Jul 1, 2025

This PR needs a release notes: label

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kimishpatel commented Jul 1, 2025 •

edited

Loading

pytorch-bot bot commented Jul 1, 2025 •

edited

Loading

This PR needs a `release notes:` label