TransformerBlock: support attention skips #14826

sxu · 2025-10-06T20:41:32Z

Summary: We want to support attention skips. This diff modifies TransformerBlock to make attention_norm and attention optional. Since our export script directly constructs the TransformerBlocks themselves, this is enough for our use case. The top level Transformer class still require a single attention_type, to make that interface also support attention skip (which requires different configuration for each layer) is not within the scope of this diff.

Differential Revision: D84003431

pytorch-bot · 2025-10-06T20:41:36Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/14826

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

Driver update on H100 and A100 instances

❌ 3 New Failures, 1 Unrelated Failure

As of commit 976b081 with merge base 91f1769 ():

NEW FAILURES - The following jobs have failed:

pull / test-samsung-models-linux / linux-job (gh)
RuntimeError: Command docker exec -t 6e2d5777bde68cc3c1273d1cb2a52d1f8c6eec6c956962b0abf9edccee7f6a5e /exec failed with exit code 1
pull / unittest-arm-backend-with-no-fvp (test_pytest_ops) / linux-job (gh)
RuntimeError: Command docker exec -t 52a1b7905df612c84f265c81eac01d8fe95dbf2ea87236a1ce9fe1805ff0af3b /exec failed with exit code 1
Test CUDA Builds / test-voxtral-cuda-e2e / linux-job (gh)
RuntimeError: Command docker exec -t 7158e4f948a3712bb18db33cde91e0fe8f6349688c3a2f67a75f266f8fce273f /exec failed with exit code 2

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

Apple / build-demo-ios / macos-job (gh) (trunk failure)
RuntimeError: Command bash /Users/runner/work/_temp/exec_script failed with exit code 74

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot · 2025-10-06T20:41:39Z

@sxu has exported this pull request. If you are a Meta employee, you can view the originating Diff in D84003431.

Summary: We want to support attention skips. This diff modifies `TransformerBlock` to make `attention_norm` and `attention` optional. Since our export script directly constructs the `TransformerBlock`s themselves, this is enough for our use case. The top level `Transformer` class still require a single `attention_type`, to make that interface also support attention skip (which requires different configuration for each layer) is not within the scope of this diff. Differential Revision: D84003431

meta-codesync · 2025-10-06T23:29:13Z

@sxu has exported this pull request. If you are a Meta employee, you can view the originating Diff in D84003431.

examples/models/llama/llama_transformer.py

Summary: We want to support attention skips. This diff modifies `TransformerBlock` to make `attention_norm` and `attention` optional. Since our export script directly constructs the `TransformerBlock`s themselves, this is enough for our use case. The top level `Transformer` class still require a single `attention_type`, to make that interface also support attention skip (which requires different configuration for each layer) is not within the scope of this diff. Reviewed By: billmguo Differential Revision: D84003431

sxu requested review from jackzhxng and lucylq as code owners October 6, 2025 20:41

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 6, 2025

facebook-github-bot added fb-exported meta-exported labels Oct 6, 2025

sxu requested a review from billmguo October 6, 2025 20:41

sxu added the release notes: none Do not include this in the release notes label Oct 6, 2025

sxu force-pushed the export-D84003431 branch from 6d6c3a0 to b22ac81 Compare October 6, 2025 23:29

billmguo approved these changes Oct 7, 2025

View reviewed changes

jackzhxng added the ciflow/trunk label Oct 7, 2025

jackzhxng reviewed Oct 7, 2025

View reviewed changes

examples/models/llama/llama_transformer.py Outdated Show resolved Hide resolved

sxu force-pushed the export-D84003431 branch from b22ac81 to 6d9231c Compare October 7, 2025 16:57

sxu force-pushed the export-D84003431 branch from 6d9231c to 99767f9 Compare October 7, 2025 17:38

sxu force-pushed the export-D84003431 branch from 99767f9 to 05c50fe Compare October 7, 2025 18:55

sxu force-pushed the export-D84003431 branch from 05c50fe to 976b081 Compare October 8, 2025 14:44

meta-codesync bot merged commit 2672dd3 into pytorch:main Oct 8, 2025
284 of 290 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

TransformerBlock: support attention skips #14826

TransformerBlock: support attention skips #14826

Uh oh!

sxu commented Oct 6, 2025

Uh oh!

pytorch-bot bot commented Oct 6, 2025 •

edited

Loading

Uh oh!

facebook-github-bot commented Oct 6, 2025

Uh oh!

meta-codesync bot commented Oct 6, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

TransformerBlock: support attention skips #14826

TransformerBlock: support attention skips #14826

Uh oh!

Conversation

sxu commented Oct 6, 2025

Uh oh!

pytorch-bot bot commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/14826

❗ 1 Active SEVs

❌ 3 New Failures, 1 Unrelated Failure

Uh oh!

facebook-github-bot commented Oct 6, 2025

Uh oh!

meta-codesync bot commented Oct 6, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

pytorch-bot bot commented Oct 6, 2025 •

edited

Loading