[DRAFT] Use mask instead of cond for attention conditional logic #7536

jackzhxng · 2025-01-07T04:40:28Z

Summary

Use masking to avoid resorting to torch.cond, which prevents us from mutating inside branches thereby forcing us to clone the kv cache and create lots of unnecessary copies.

Also gets past the current limitation that the partitioners don't automatically recursively partition conditional subgraphs, allowing us to directly partition Llama 3.2 MM with XNNPack.

Llama 3.2 MM comparison against XNNPack + KV cache + custom SDPA

Metric	Before	After
Activations	0.95 GB	0.27 GB
.pte size	60 GB	30 GB
Prefill	20.91 s	0.86 s
Generation	0.28 tok/s	6.26 tok/s

Test plan

Rely on existing regression tests (test_attention and test_kv_cache - about to be merged), which have adequate coverage over kv cache and multi-head attention edge cases.

pytorch-bot · 2025-01-07T04:40:31Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/7536

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 4 New Failures

As of commit 220aec8 with merge base ca32105 ():

NEW FAILURES - The following jobs have failed:

Check Labels / Check labels (gh)
RuntimeError: Error checking labels: PR does not have required labels
Lint / lintrunner / linux-job (gh)
>>> Lint for extension/llm/modules/test/test_attention.py:
pull / unittest / linux / linux-job (gh)
extension/llm/modules/test/test_attention.py::AttentionTest::test_attention_torch_cond_eager
pull / unittest / macos / macos-job (gh)
extension/llm/modules/test/test_attention.py::AttentionTest::test_attention_torch_cond_eager

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2025-01-07T04:41:03Z

This PR needs a `release notes:` label

If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

jackzhxng · 2025-01-07T04:44:33Z

extension/llm/modules/attention.py

+                k, v = self.kv_cache.update(k, v)

-        output = self._sdpa(q, k, v, b, s_x, mask=mask)
+        output = self._sdpa(q, k, v, b, s_x)


Note to self: the refactor removed the mask arg, make sure to add it back

Use mask instead of cond for attention conditional logic

220aec8

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 7, 2025

jackzhxng commented Jan 7, 2025

View reviewed changes

jackzhxng closed this Feb 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[DRAFT] Use mask instead of cond for attention conditional logic #7536

[DRAFT] Use mask instead of cond for attention conditional logic #7536

Uh oh!

jackzhxng commented Jan 7, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Jan 7, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Jan 7, 2025

Uh oh!

jackzhxng Jan 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[DRAFT] Use mask instead of cond for attention conditional logic #7536

[DRAFT] Use mask instead of cond for attention conditional logic #7536

Uh oh!

Conversation

jackzhxng commented Jan 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Llama 3.2 MM comparison against XNNPack + KV cache + custom SDPA

Test plan

Uh oh!

pytorch-bot bot commented Jan 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/7536

❌ 4 New Failures

Uh oh!

github-actions bot commented Jan 7, 2025

This PR needs a release notes: label

Uh oh!

jackzhxng Jan 7, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jackzhxng commented Jan 7, 2025 •

edited

Loading

pytorch-bot bot commented Jan 7, 2025 •

edited

Loading

This PR needs a `release notes:` label