Skip to content

Conversation

sxu
Copy link
Contributor

@sxu sxu commented Oct 6, 2025

Summary: We want to support attention skips. This diff modifies TransformerBlock to make attention_norm and attention optional. Since our export script directly constructs the TransformerBlocks themselves, this is enough for our use case. The top level Transformer class still require a single attention_type, to make that interface also support attention skip (which requires different configuration for each layer) is not within the scope of this diff.

Differential Revision: D84003431

@sxu sxu requested review from jackzhxng and lucylq as code owners October 6, 2025 20:41
Copy link

pytorch-bot bot commented Oct 6, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/14826

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 3 New Failures, 1 Unrelated Failure

As of commit 976b081 with merge base 91f1769 (image):

NEW FAILURES - The following jobs have failed:

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot
Copy link
Contributor

@sxu has exported this pull request. If you are a Meta employee, you can view the originating Diff in D84003431.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 6, 2025
@sxu sxu requested a review from billmguo October 6, 2025 20:41
@sxu sxu added the release notes: none Do not include this in the release notes label Oct 6, 2025
sxu added a commit to sxu/executorch that referenced this pull request Oct 6, 2025
Summary:

We want to support attention skips. This diff modifies `TransformerBlock` to make `attention_norm` and `attention` optional. Since our export script directly constructs the `TransformerBlock`s themselves, this is enough for our use case. The top level `Transformer` class still require a single `attention_type`, to make that interface also support attention skip (which requires different configuration for each layer) is not within the scope of this diff.

Differential Revision: D84003431
@sxu sxu force-pushed the export-D84003431 branch from 6d6c3a0 to b22ac81 Compare October 6, 2025 23:29
Copy link

meta-codesync bot commented Oct 6, 2025

@sxu has exported this pull request. If you are a Meta employee, you can view the originating Diff in D84003431.

@sxu sxu force-pushed the export-D84003431 branch from b22ac81 to 6d9231c Compare October 7, 2025 16:57
sxu added a commit to sxu/executorch that referenced this pull request Oct 7, 2025
Summary:

We want to support attention skips. This diff modifies `TransformerBlock` to make `attention_norm` and `attention` optional. Since our export script directly constructs the `TransformerBlock`s themselves, this is enough for our use case. The top level `Transformer` class still require a single `attention_type`, to make that interface also support attention skip (which requires different configuration for each layer) is not within the scope of this diff.

Reviewed By: billmguo

Differential Revision: D84003431
sxu added a commit to sxu/executorch that referenced this pull request Oct 7, 2025
Summary:

We want to support attention skips. This diff modifies `TransformerBlock` to make `attention_norm` and `attention` optional. Since our export script directly constructs the `TransformerBlock`s themselves, this is enough for our use case. The top level `Transformer` class still require a single `attention_type`, to make that interface also support attention skip (which requires different configuration for each layer) is not within the scope of this diff.

Reviewed By: billmguo

Differential Revision: D84003431
@sxu sxu force-pushed the export-D84003431 branch from 6d9231c to 99767f9 Compare October 7, 2025 17:38
sxu added a commit to sxu/executorch that referenced this pull request Oct 7, 2025
Summary:

We want to support attention skips. This diff modifies `TransformerBlock` to make `attention_norm` and `attention` optional. Since our export script directly constructs the `TransformerBlock`s themselves, this is enough for our use case. The top level `Transformer` class still require a single `attention_type`, to make that interface also support attention skip (which requires different configuration for each layer) is not within the scope of this diff.

Reviewed By: billmguo

Differential Revision: D84003431
@sxu sxu force-pushed the export-D84003431 branch from 99767f9 to 05c50fe Compare October 7, 2025 18:55
Summary:

We want to support attention skips. This diff modifies `TransformerBlock` to make `attention_norm` and `attention` optional. Since our export script directly constructs the `TransformerBlock`s themselves, this is enough for our use case. The top level `Transformer` class still require a single `attention_type`, to make that interface also support attention skip (which requires different configuration for each layer) is not within the scope of this diff.

Reviewed By: billmguo

Differential Revision: D84003431
@sxu sxu force-pushed the export-D84003431 branch from 05c50fe to 976b081 Compare October 8, 2025 14:44
@meta-codesync meta-codesync bot merged commit 2672dd3 into pytorch:main Oct 8, 2025
284 of 290 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported meta-exported release notes: none Do not include this in the release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants