Qualcomm AI Engine Direct - Add MHA2SHA pass #15438

shewu-quic · 2025-10-29T06:56:12Z

Background

We observed that quantizing and compiling the original sha model requires a significant amount of time. Switching to the mha model speeds up this process. Therefore, we investigated whether converting the mha model after quantization is feasible. However, we cannot perform this conversion during the to_edge transformation, as splitting the convolution weights to sha would require modifying the state_dict, which is not permitted at that stage. Therefore, we decided to apply this pass during qnn_preprocess.

Summary:

Integrated mha into sha pass and implemented it in qnn_preprocess
Refactored mha in static llama
- Included spin quant r3 support and masked softmax for MHA model in static llama
- Combined the n_heads key-value cache into a single cache for each layer to decrease the number of inputs and outputs, which enhances performance.
Deprecated ShiftPointer kv updater mode
- Since each layer now has its own kv cache, the v cache no longer benefits from ShiftPointer, which previously avoided copying the new v cache to the input v cache. To prevent user confusion, ShiftPointer mode has been deprecated
Applied the correct input template for smollm2 135m
Correct the quantization annotation for reshape
Remove outdated code from CanonicalizeConv

Results

Follow README setting, test on SM8750 with QNN 2.37. Compared the new pass convert_mha_to_sha with original sha structure

Summary: - Integrated mha into sha pass and implemented it in qnn_preprocess - Refactored mha in static llama - Added support for masked softmax - Included spin quant r3 support - Combined the n_heads key-value cache into a single cache for each layer to decrease the number of inputs and outputs, which enhances performance. - Deprecated ShiftPointer kv updater mode - Since each layer now has its own kv cache, the v cache no longer benefits from ShiftPointer, which previously avoided copying the new v cache to the input v cache. To prevent user confusion, ShiftPointer mode has been deprecated - Applied the correct input template for smollm2 135m - Corrected the quantization annotation for reshape - Remove outdated code from CanonicalizeConv

pytorch-bot · 2025-10-29T06:56:15Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/15438

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 18e7db1 with merge base 3485495 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

shewu-quic · 2025-10-29T06:57:02Z

@pytorchbot label "release notes: qualcomm"

shewu-quic · 2025-10-30T00:10:46Z

Hi @cccclai,
This PR is to migrate mha2sha transformation from source level to a pass which apply on qnn_preprocess. It can significantly improve lowering time including quantization and compilation time.
Could you please take a look?

Thanks

shewu-quic requested a review from cccclai as a code owner October 29, 2025 06:56

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 29, 2025

pytorch-bot bot added the release notes: qualcomm Changes to the Qualcomm backend delegate label Oct 29, 2025

update bc storyllama pte

18e7db1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Qualcomm AI Engine Direct - Add MHA2SHA pass #15438

Qualcomm AI Engine Direct - Add MHA2SHA pass #15438

Uh oh!

shewu-quic commented Oct 29, 2025

Uh oh!

pytorch-bot bot commented Oct 29, 2025 •

edited

Loading

Uh oh!

shewu-quic commented Oct 29, 2025

Uh oh!

shewu-quic commented Oct 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Qualcomm AI Engine Direct - Add MHA2SHA pass #15438

Are you sure you want to change the base?

Qualcomm AI Engine Direct - Add MHA2SHA pass #15438

Uh oh!

Conversation

shewu-quic commented Oct 29, 2025

Background

Summary:

Results

Uh oh!

pytorch-bot bot commented Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/15438

✅ No Failures

Uh oh!

shewu-quic commented Oct 29, 2025

Uh oh!

shewu-quic commented Oct 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

pytorch-bot bot commented Oct 29, 2025 •

edited

Loading