Skip to content

Conversation

@shewu-quic
Copy link
Collaborator

@shewu-quic shewu-quic commented Nov 8, 2024

summary:

  • Support copy op with QNN Reshape
  • Consume mutable buffer in QNN Delegate
  • Set the same memory address for I/O of mutable buffer at runtime

Test the PR for llama 3.2 1B instruct with seq_len=512 on SM8650
image
Test the mainline
image

@pytorch-bot
Copy link

pytorch-bot bot commented Nov 8, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/6727

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 1 New Failure

As of commit 89af1e0 with merge base 86cb5d7 (image):

NEW FAILURE - The following job has failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 8, 2024
@shewu-quic
Copy link
Collaborator Author

Hi @cccclai,
This PR is to delegate mutable buffer and maintain it in QNN Backend.
I also added a condition to choose whether consuming mutable buffer or not.
Please have a look.

Thank you very much :)

Copy link
Contributor

@cccclai cccclai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks very solid, thanks!

@facebook-github-bot
Copy link
Contributor

@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@cccclai
Copy link
Contributor

cccclai commented Nov 11, 2024

Hey it seems like breaking the CI test-llama-runner-qnn-linux, can you take a look?

@shewu-quic
Copy link
Collaborator Author

Hey it seems like breaking the CI test-llama-runner-qnn-linux, can you take a look?

It seems to be something wrong to delegate mutable buffer without quantize.
I am trying to figure out the root cause. If it is not easy to solve, I will disable delegated mutable buffer in fp mode.

@shewu-quic
Copy link
Collaborator Author

It seems that delegated mutable buffer is not removed from the output.
When I trace back, I found the mutable buffer doesn't exist in original_program.state_dict . So, it doesn't be added into output_specs_to_delete. Do you have any idea for it?

summary:
- Support copy op with QNN Reshape
- Consume mutable buffer in QNN Delegate
- Set the same memory address for I/O of mutable buffer at runtime
@shewu-quic shewu-quic force-pushed the dev1/hutton/delegated_mutable_buffer branch from 7f236c3 to da1df61 Compare November 19, 2024 03:23
@github-actions
Copy link

This PR needs a release notes: label

If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

@shewu-quic
Copy link
Collaborator Author

@pytorchbot label "topic: not user facing"

@pytorch-bot
Copy link

pytorch-bot bot commented Nov 19, 2024

Didn't find following labels among repository labels: topic: not user facing

@shewu-quic shewu-quic force-pushed the dev1/hutton/delegated_mutable_buffer branch from da1df61 to 89af1e0 Compare November 19, 2024 03:26
@cccclai
Copy link
Contributor

cccclai commented Nov 19, 2024

It seems like the label thing is new...will check how to resolve it

@cccclai
Copy link
Contributor

cccclai commented Feb 7, 2025

Hello, is this PR still needed? Assuming yes, but we're focusing on static llama now...

@shewu-quic
Copy link
Collaborator Author

Hello, is this PR still needed? Assuming yes, but we're focusing on static llama now...

Yes, I think we can close it. Thanks.

@shewu-quic shewu-quic closed this Feb 7, 2025
cccclai pushed a commit that referenced this pull request Jun 20, 2025
…le buffer issue (#11782)

Summary:
- Add a parameter to support mutable buffer delegation in QNN Backend
  - Set the same memory address for I/O of mutable buffer at runtime
  - Ref: #6727
- Avoid annotating the input node because mutable buffers will be folded
during the convert_pt2e process.
- Deprecated use_legacy_export in executorch llama


cc @cccclai @winskuo-quic @cbilgin
hinriksnaer pushed a commit to hinriksnaer/executorch that referenced this pull request Jun 26, 2025
…le buffer issue (pytorch#11782)

Summary:
- Add a parameter to support mutable buffer delegation in QNN Backend
  - Set the same memory address for I/O of mutable buffer at runtime
  - Ref: pytorch#6727
- Avoid annotating the input node because mutable buffers will be folded
during the convert_pt2e process.
- Deprecated use_legacy_export in executorch llama


cc @cccclai @winskuo-quic @cbilgin
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants