Skip to content

Conversation

lucylq
Copy link
Contributor

@lucylq lucylq commented Oct 7, 2025

Summary

Enable quantization with program-data separation.

To select weights for separation, we tag nodes on the eager model. After quantization, qdq nodes are generated. These do not contain the external tags that their inputs have. This PR propagates the tags to the qdq nodes, so that quant weights are moved to external file and can be shared.

Test plan

python -m unittest executorch.backends.xnnpack.test.passes.test_propagate_custom_meta_pass

Copy link

pytorch-bot bot commented Oct 7, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/14864

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 3 New Failures, 3 Unrelated Failures

As of commit db612db with merge base d00279d (image):

NEW FAILURES - The following jobs have failed:

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 7, 2025
@lucylq lucylq requested a review from GregoryComer October 7, 2025 20:34
Copy link

github-actions bot commented Oct 7, 2025

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

@lucylq lucylq force-pushed the lfq.xnnpack-qdq-tags branch from fa97c1b to bc8f182 Compare October 7, 2025 20:41
@lucylq lucylq marked this pull request as ready for review October 7, 2025 20:42
@lucylq lucylq requested a review from digantdesai as a code owner October 7, 2025 20:42
@lucylq lucylq force-pushed the lfq.xnnpack-qdq-tags branch from bc8f182 to 9ef58fe Compare October 8, 2025 21:53
@lucylq lucylq requested a review from cccclai as a code owner October 8, 2025 21:53
return self.converted_graph.forward(*inputs)

class Quantize_(Stage):
"""
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GregoryComer let me know if I should move this into the test class, instead of the test harness, or rename.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I like it here.

@lucylq lucylq force-pushed the lfq.xnnpack-qdq-tags branch from 9ef58fe to f7515a9 Compare October 8, 2025 22:02
Copy link
Member

@GregoryComer GregoryComer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the tester stage! I don't really have a better idea on the name, and it's not part of the public API surface, so I'm good with these changes.

exec = tester.get_artifact()
program_buffer = exec.buffer
self.assertEqual(len(exec._tensor_data), 1)
data_buffer = bytes(exec._tensor_data.pop("model"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to assert the size of it? Just to make sure it is indeed the quantized weight tensor. Also I would like (somehow) to validate that we also didn't put this in the blob, perhaps by asserting that the blob size is < weight_size (if we have large-ish weights).

Copy link
Contributor Author

@lucylq lucylq Oct 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks - I'll add a check on size.

Re validating that it's not in the blob, we can check that forward fails when we do not pass in the data buffer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@digantdesai added a check on size, and check on accuracy.

Verified locally that we're missing the weight if we do not pass in the data buffer. The test segfaults after ~4 runs though, maybe something isn't cleaned up properly in pybindings. Left that as a todo for now.

Copy link
Contributor

@digantdesai digantdesai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @lucylq. I hope we have quantized LoRA working OK with XNNPACK now?

@lucylq
Copy link
Contributor Author

lucylq commented Oct 9, 2025

@digantdesai Yes! Thanks a lot for the help. There were a few issues with export_llama and emitter, but I was able to generate quantized lora files. They work well on desktop, need to debug load issues on android.

@lucylq lucylq force-pushed the lfq.xnnpack-qdq-tags branch 7 times, most recently from 714f81a to 0c08ac7 Compare October 13, 2025 02:57
@lucylq lucylq force-pushed the lfq.xnnpack-qdq-tags branch from 0c08ac7 to db612db Compare October 13, 2025 16:09
@lucylq lucylq merged commit 19c9ff3 into main Oct 14, 2025
209 of 224 checks passed
@lucylq lucylq deleted the lfq.xnnpack-qdq-tags branch October 14, 2025 00:56
Gasoonjia pushed a commit that referenced this pull request Oct 14, 2025
### Summary
Enable quantization with program-data separation.

To select weights for separation, we tag nodes on the eager model. After
quantization, qdq nodes are generated. These do not contain the external
tags that their inputs have. This PR propagates the tags to the qdq
nodes, so that quant weights are moved to external file and can be
shared.

### Test plan
```
python -m unittest executorch.backends.xnnpack.test.passes.test_propagate_custom_meta_pass
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants